The DevOps Solutions Architect is a member of the IT Enterprise Applications team - focusing on DevOps, SRE and AI Ops.
DevOps focuses on the end-to-end application lifecycle.
SRE focuses on delivery and the stability of the production environment.
AI Ops is centered on the deployment, oversight, and monitoring of AI-specific elements.
You are responsible for the smooth operation of Teknion’s Enterprise Applications infrastructure. You have an essential role in integrating the various project solutions within the existing application and infrastructure.
You will interface directly with your senior technology leaders to transform business and technology capabilities. You will be a dedicated contributor to senior leaders as they define a target state, roadmaps, and identify new and emerging technologies that will transform and optimize the business.
Roles & Responsibilities
DevOps
Design and implement end to end highly scalable and resilient solutions for infrastructure and application services with Teknion’s hybrid clouds
Software Automated Delivery
Design, implement, and manage Continuous Integration / Continuous Delivery (CI/CD) pipelines to automate the build, test, and deployment processes
Automate the configuration and management of systems and applications
Design, implement and manage Source Code Control (Github); software components and build artifacts in a repository manager integrating into CI/CD pipelines
Test Automations driven by Test-Driven Development strategies in partnership with development leading to increase in code quality and confidence
Web Application Firewall & Reverse Proxy
Manage application security posture protecting APIs & Web Applications at the edge
Manage the deployment and configuration of application based definitions
Integrate WAF into CI/CD pipelines to ensure security is built into development process
Align & implement WAF policies with industry & organization standards
Implement and manage Reverse Proxy and Web Application Firewalls (Cloudflare WAF) to provide unified application security posture protecting APIs & Web Applications at the edge; reduces client-side risks
Cyber Security o Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management. Address Common Vulnerabilities and Exposures (CVE) as per established procedures
Other duties as assigned
Site Reliability
Reliability and Availability
Design and implement Development, QA, UAT and Production application & database environments
Ensure application environments, tools & approved 3rd party components are kept up to date as per established patching & update procedures. Liaise with vendors to manage the monthly patching exercises
Ensure unplanned downtime is kept to a minimum (preferred 0.00%)
Implement automated processes wherever possible with continuous modernization and upgrade of existing processes / scripts
Identity & Access Management (IAM)
Manage, Configure and monitor applications related IAM actions
Incident Response
Responding to and mitigating production incidents.
Conducting post-incident reviews (postmortems) to identify root causes and prevent future occurrences.
Monitoring and Alerting
Designing and implementing comprehensive monitoring systems to track system health and performance.
Setting up effective alerting mechanisms to notify teams of potential issues.
Participate in code reviews, security audits, and performance testing to maintain the integrity of Cloud and Hybrid solutions
Change Management
Managing and automating the deployment of software changes.
Implementing safe deployment practices, such as canary releases and blue-green deployments.
Participate & contribute in IT / Cyber Security Change Advisory Board meetings
Drives continuous technology transformation to minimize technical debt
Provides architecture direction for developers recognizing custom and standard technical frameworks, GRC (Governance, Risk & Compliance) audit policies and procedures including PII (Personally Identifiable Information) and CUI (Controlled Unclassified Information)
Participates in defining target state technology architecture and roadmaps & ensure alignment of initiatives
Work closely with cross-functional application & infrastructure teams to produce comprehensive end-to-end solution opportunities
AI Observability & Performance Monitoring
Design and implement monitoring dashboards within APM tools (Prometheus, Grafana, Datadog, or New Relic) to track AI-specific metrics such as API latency, token utilization, and foundational model error rates.
Set up cost-tracking alerts to monitor the consumption of Generative AI resources and prevent budget overruns in Development, QA, UAT, and Production environments.
Strategic Governance & Data Compliance
Provide architectural guidelines for developers to ensure AI applications strictly adhere to GRC audit policies, specifically blocking the leakage of Personally Identifiable Information (PII) and Controlled Unclassified Information (CUI) into public AI training sets.
Maintain accurate Standard Operating Procedures (SOPs) detailing the failover and recovery mechanisms for AI-driven system capabilities.
Stay up-to-date with the latest technologies and security trends to ensure our solutions remain innovative, secure, and cost-efficient
Define and maintain documentation of architectural solutions and procedures (Standard Operating Procedures)
Other duties as assigned
AI Ops
AI Security & Edge Protection
Configure and manage Web Application Firewalls (Cloudflare WAF) and API gateways to safeguard Generative AI endpoints from emerging threats like prompt injection and data exfiltration.
Integrate security guardrails into the development process to automatically scan and intercept unsafe data payloads sent to external or internal AI foundation models.
AIOps, Observability & Incident Governance
AI Observability & Cost Tracking: Design and implement monitoring dashboards within APM tools (e.g., Prometheus, Grafana) to track AI-specific metrics like API latency, token utilization, and foundation model error rates. Establish cost-tracking guardrails across local and cloud environments.
AIOps & Intelligent Event Correlation: Architect and implement AIOps platforms to ingest, aggregate, and correlate telemetry data
Other duties as assigned
Skills & Qualifications
Education, Experience & Soft Skills
Bachelor’s degree in information technology, software engineering, computer science, or related
Proven experience in engineering and software architecture design.
Must be self-motivated and driven. Strong ability to work with internal resources and vendors
Technical Skills
Virtual Private Clouds and Cloud Platforms (AWS, Azure, GCP):
Experience in managing Virtual Private Clouds (OSI Transport layer and above)
Experience with cloud services like EC2, S3, Azure VMs, Kubernetes Engine, etc.
Understanding of cloud networking, security, and infrastructure as code.
Containerization and Orchestration:
Expertise in Docker for containerizing applications.
Experience with Kubernetes or other orchestration tools for managing containerized workloads.
Configuration Management:
Familiarity with tools like Ansible for automating system configurations.
CI/CD Tools:
Expertise in CI/CD tools - Jenkins, GitHub Actions CI/CD
Scripting and Programming:
Proficiency in scripting languages like Python, Bash, or PowerShell.
Understanding of programming concepts for building automation tools.
Operating Systems & Server Management:
Strong understanding of RHEL 8 (& above) and/or Windows Server.
Networking:
Networking knowledge, including TCP/IP, DNS, and load balancing.
Identity & Access Management:
Experience in Okta, Active Directory, Azure Active Directory
Observability, Monitoring and Logging:
Experience with monitoring tools like Prometheus, Grafana, New Relic or Datadog.
Familiarity with logging tools like ELK stack (Elasticsearch, Logstash, Kibana) or Splunk.
Fundamental understanding of Generative AI principles, foundation models (LLMs), tokenization, and basic prompt engineering lifecycle concepts.
Basic familiarity with configuring vector databases or semantic caching mechanisms alongside standard database systems like Postgres, NoSQL, or MongoDB
The expected base salary range for this position $110,000 - $130,000. Final base salary offers will reflect an assessment of the selected candidate's skills, demonstrated competencies, and adherence to our internal pay equity framework.
Teknion is committed to supporting a culture of diversity and accessibility across the organization, starting with the hiring process. It is our priority to remove barriers to provide equal access to employment and support a diverse workforce. Teknion welcomes and encourages applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process. All information received in relation to accommodation will be kept confidential.
By applying for a position with Teknion, you understand that, should you be made an offer, it will be contingent on your undergoing and successfully completing a background check consistent with Teknion's employment policies. Background checks may include some or all the following based on the nature of the position: SSN/SIN validation, education verification, employment verification, credit check and criminal check. You will be notified during the hiring process which checks are required by the position.
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in Canada.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in Canada, connecting you to thousands of jobs fast!
Find the best jobs in Canada, apply in 1 click and get a job today!