S

Principal Site Reliability Engineer

salary Salary :

$275,000 monthly

icon building Company : Saviynt
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Principal Site Reliability Engineer


Why Join Saviynt

 

•        Work on a mission-critical SaaS platform used by global enterprises

•        Solve complex reliability challenges at scale

•        Influence architecture and engineering culture at a company level

•        Competitive compensation, benefits, and growth opportunities

 

 

Security & Compliance

 

This role requires compliance with Saviynt’s information security and privacy policies, including annual security training

What You Will Be Doing


In this pivotal role, you will be instrumental in designing, building, and maintaining the shared infrastructure services and platforms that our product and application teams will depend on

 

You will focus on creating reusable, reliable, and scalable solutions that abstract away complexity, enabling other teams to focus on their core business logic and deliver features faster in a multi-cloud environment

 

Design and build core platform components and shared infrastructure services that other development teams will integrate with and leverage to deploy and operate their applications

 

Architect, implement, and manage highly available and scalable Kubernetes platforms as a service for internal consumers

 

Develop robust, internal-facing tools and automation for infrastructure provisioning and management primarily using Go (Golang)

 

Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.), focusing on creating reusable patterns and modules for other teams

 

Design and implement shared Event-Driven Architecture components and messaging platforms using technologies like Kafka or Google Pub/Sub that product teams can easily utilize

 

Develop and maintain robust CI/CD pipelines (e.g., GitLab CI and ArgoCD) as a service, providing standardized and automated deployment workflows for various development teams

 

Design and build resilient Distributed Systems components that serve as building blocks for other applications, focusing on reliability, fault tolerance, and performance

 

Manage and optimize our shared infrastructure across Multi-Region Cloud Environments, ensuring that platform services are globally available and performant for all consumers

 

Establish and enhance centralized Observability and Monitoring platforms and tools that provide self-service insights for consuming teams

 

Define and implement clear, well-documented RESTful API designs for the infrastructure services you build, ensuring ease of integration for internal clients

 

Implement and manage Service Mesh (e.g., Envoy, Istio) capabilities, providing traffic management, security, and policy enforcement as a shared platform for services

 

Design, implement, and optimize highly available Relational Database services or shared data platforms for broad organizational use

 

Collaborate closely with product development teams to understand their infrastructure needs and pain points, providing technical guidance and support

 

Participate in on-call rotations to support the critical shared infrastructure you build

What You Bring


9+ years of experience in an Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a strong focus on building tools and services for other engineers

 

Deep expertise with Kubernetes in production environments, particularly in providing it as a platform(i.e single tenant and multi-tenant deployment architectures)

 

Strong programming skills in Go (Golang) and Python, with experience building robust, maintainable backend services and automation

 

Extensive hands-on experience with at least one major Cloud Provider (AWS, GCP, or Azure); multi-cloud experience is a strong plus, especially in building abstractions over them

 

Proven experience designing and implementing Event-Driven Architecture and message queuing systems (e.g., Kafka, RMQ, NATS) as shared services

 

Solid understanding and practical experience with CI/CD pipeline tools (especially GitLab CI) and experience establishing automated delivery processes for other teams

 

Demonstrable experience designing and operating Distributed Systems, with an understanding of patterns for creating reliable, shared components

 

Familiarity with Multi-Region Cloud Environments and strategies for building globally distributed and highly available platform

 

Proficiency in establishing and utilizing comprehensive Observability and Monitoring platforms (e.g., Prometheus, Grafana, ELK stack, Datadog) for shared infrastructure

 

Strong experience with RESTful API design principles and building well-documented, consumable APIs

 

Knowledge of Service Mesh concepts and practical experience with solutions like Istio in a platform context

 

Hands-on experience with Relational Databases (e.g., MySQL, PostgresSQL), ideally in managing them as a service

 

Excellent communication skills and the ability to clearly articulate complex technical concepts to both technical and non-technical audiences

 

A strong customer-centric mindset, treating internal development teams as your primary customers

 

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience or equivalent military experience required

$260,000 - $275,000 a year
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Original job Principal Site Reliability Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Principal Site Reliability Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Principal Site Reliability Engineer Jobs in Canada

GrabJobs is the no1 job portal in Canada, connecting you to thousands of jobs fast! Find the best jobs in Canada, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.