C

Site Reliability Engineer

icon building Company : Carfax Canada
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Site Reliability Engineer

We are looking to hire a Site Reliability Engineer who will help in building and maintaining the observability platform across multiple business lines, helping to establish observability best practices.

What you'll be doing:

  • Build and improve observability and reliability solutions that help engineering teams operate and support their services with confidence.
  • Partner with engineering teams to design monitoring, alerting, dashboards, and service health standards early in the software delivery lifecycle.
  • Write and maintain code, infrastructure definitions, and automation that reduce manual work and improve reliability.
  • Help engineers instrument services and systems so teams can quickly detect, diagnose, and resolve issues.
  • Support the adoption and standardization of telemetry patterns across metrics, logs, and traces, including OpenTelemetry-based instrumentation where appropriate.
  • Improve the reliability of our AWS and Kubernetes environments, including EKS, through durable engineering solutions rather than repetitive operational work.
  • Participate in incident response and follow-up activities, including troubleshooting, root cause analysis, and the implementation of lasting fixes.
  • Identify opportunities to reduce toil and improve the developer experience through automation, reusable patterns, and better engineering practices.
  • Continuously evaluate our tooling, reliability practices, and engineering processes for opportunities to improve.

What we're looking for:

  • Experience in Site Reliability Engineering, DevOps, Platform Engineering, or Software Engineering roles, with meaningful ownership of reliability-focused solutions.
  • Proven experience building, automating, and maintaining engineering solutions, not just operating existing systems.
  • Experience writing production-quality code, scripts, or automation. Go is preferred; experience in other languages such as JavaScript/TypeScript or Ruby is also valuable.
  • Experience managing cloud infrastructure with Infrastructure as Code. Terraform preferred.
  • Experience working with AWS and Kubernetes environments, including EKS.
  • Experience with distributed systems and the trade-offs involved in designing for reliability, resiliency, and durability.
  • Experience with observability tooling such as Prometheus, Grafana, New Relic, CloudWatch, Google Observability, or similar platforms.
  • Familiarity with telemetry standards and instrumentation patterns, including OpenTelemetry, is strongly preferred.
  • Experience designing useful monitoring and alerting for applications and infrastructure, with an understanding of how to balance signal, noise, and actionable response.
  • Experience with logging and telemetry pipelines at scale. Bindplane experience is a plus.
  • Strong troubleshooting skills and the ability to work collaboratively during incidents to restore service and address root causes.
  • Strong communication skills, with the ability to document standards, guide engineering teams, and influence reliability best practices.
  • A strong bias toward automation, simplification, and reducing toil for yourself and your teammates.

Nice to have:

  • Experience working with OpenSearch, Elasticsearch, ELK, or similar logging and search platforms.
  • Experience with telemetry pipeline design, routing, sampling, or retention decisions.
  • Experience supporting applications written in Go, JavaScript/TypeScript, or Ruby.

 

If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

It is the policy of Mobility to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, Mobility will provide reasonable accommodations for qualified individuals with disabilities.

Original job Site Reliability Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Site Reliability Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Site Reliability Engineer Jobs in Canada

GrabJobs is the no1 job portal in Canada, connecting you to thousands of jobs fast! Find the best jobs in Canada, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.