J

Site Reliability Engineer

icon building Company : Jobgether
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Site Reliability Engineer










This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Reliability Engineer based in India.


In this role, you will help design, operate, and scale highly available distributed systems that power mission-critical cloud and data platforms. You will work at the intersection of infrastructure, automation, and reliability engineering, ensuring systems remain resilient, observable, and performant under real-world production demands. The environment is fast-paced, cloud-native, and deeply technical, with strong emphasis on Kubernetes-based architectures and modern DevOps practices. You will collaborate closely with engineering, data, and AI/ML teams to support complex workloads across global infrastructure. This role offers the opportunity to solve challenging scalability and performance problems at enterprise scale. It is ideal for engineers who enjoy building automation-first systems and improving reliability through engineering rigor and continuous improvement.










Accountabilities:



  • Operate and optimize containerized environments using Kubernetes and service mesh technologies such as Istio, ensuring high availability and performance across distributed systems.

  • Build automation and operational tooling using Go, Python, and Shell scripting to reduce manual intervention and improve system efficiency.

  • Design and maintain observability stacks using Prometheus, Grafana, and Loki for proactive incident detection and resolution.

  • Troubleshoot and resolve complex issues across networking, storage, and system performance layers in large-scale distributed environments.

  • Participate in on-call rotations, incident response, and postmortem analysis to continuously improve reliability and operational maturity.

  • Collaborate with AI/ML and data engineering teams to ensure infrastructure readiness for model training, inference workloads, and data pipelines.


Requirements



  • Strong hands-on experience with cloud platforms, particularly Google Cloud, and infrastructure-as-code tools such as Terraform.

  • Solid understanding of microservices architectures, containerization, and distributed systems, including production use of Kubernetes and Docker.

  • Strong SRE mindset focused on automation, scalability, observability, and reliability engineering principles.

  • Practical experience in Linux system administration, networking fundamentals, and security concepts such as PKI and secure service-to-service communication.

  • Strong problem-solving skills, ability to work in high-pressure environments, and comfort with incident management and operational ownership.


Benefits



  • Competitive total rewards package aligned with industry standards.

  • Fully remote work flexibility with no mandatory office presence.

  • Generous training and certification support to accelerate technical growth.

  • Dedicated equipment and home-office setup support, including OS choice for your workstation.

  • Annual wellness budget supporting fitness, health, and personal well-being.

  • Paid vacation, sick leave, and dedicated volunteer time off.

  • Exposure to cutting-edge cloud, data, and AI infrastructure environments..


How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!


 

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

 

 

#LI-CL1
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Original job Site Reliability Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Site Reliability Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Site Reliability Engineer Jobs in India

GrabJobs is the no1 job portal in India, connecting you to thousands of jobs fast! Find the best jobs in India, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.