Logo-of-Gauss-Labs-hiring-for-jobs-in-Canada-on-GrabJobs

Site Reliability Engineer (Vancouver)

icon building Company : Gauss Labs
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
icon loader

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Site Reliability Engineer (Vancouver)

Gauss Labs is seeking a highly skilled Site Reliability Engineer to join our team in Vancouver. As an SRE at Gauss Labs, you will play a critical role in ensuring our industrial AI platform's reliability, performance, and scalability. You will be responsible for building and maintaining a robust solution that supports our growing business at customer sites. This role requires a high level of technical expertise, a collaborative mindset, and a strong desire to continuously improve systems and processes.

Responsibilities

    • Monitoring and Alerting: Creating and maintaining robust monitoring systems to proactively identify and resolve issues before they impact customers. Implementing effective alerting mechanisms to ensure timely response to critical events.
    • Incident Response: Participating in on-call rotations and leading incident response efforts to minimize downtime and restore service quickly.
    • Automation: Developing and implementing automation tools and scripts to streamline operations, reduce manual effort, and improve efficiency.
    • Capacity Planning: Forecasting resource needs, optimizing resource utilization, and ensuring customers' infrastructure can handle increasing workloads.
    • Performance Optimization: Identifying and resolving performance bottlenecks, optimizing system performance, and improving response times.
    • Collaboration: Partnering with software engineers, data scientists, and other teams to ensure alignment and efficient operations.
    • Customer Focus: Working closely with the AI Program Manager and Technical Account Manager to understand customer issues, provide technical support, and improve customer satisfaction.
    • Continuous Improvement: Driving a culture of continuous improvement by identifying opportunities to enhance system reliability, performance, and efficiency.

Basic Qualifications

    • Bachelor's degree in computer science, engineering, or a related discipline
    • 5+ years of industry experience as a Site Reliability Engineer
    • Experience with cloud platforms (AWS, GCP, Azure), containerization technologies (Docker, Kubernetes), observability and alerting tools (Prometheus, Grafana, ElasticSearch, Jaeger)
    • Experience with scripting languages (Python, Bash)
    • Working knowledge of Github, Github actions, CI/CD concepts
    • Experience in ticket management, issue resolution, and troubleshooting
    • Strong problem-solving and troubleshooting skills
    • Excellent customer communication and interpersonal skills, fluency in verbal and written English

Preferred Qualifications

    • Knowledge of AI/ML infrastructure and workloads
    • Knowledge of big data technologies (Kafka, Flink)
    • Knowledge of database technologies (MongoDB, PostgreSQL)
[Hiring process]
Application review - Phone interview - Virtual onsite interview - VP interview/Core Value interview
Original job Site Reliability Engineer (Vancouver) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Share Job
Share Job

Auto-Apply to Site Reliability Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Site Reliability Engineer Jobs in Canada

GrabJobs is the no1 job portal in Canada, connecting you to thousands of jobs fast! Find the best jobs in Canada, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.