F

Site Reliability Engineering (SRE)

icon building Company : Flydocs
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Site Reliability Engineering (SRE)

Site Reliability Engineer (SRE) – Job Description

Job Title: Site Reliability Engineer (SRE)
Experience: 4–8 Years
Location: Delhi NCR
Employment Type: Full -Time

Job Summary

We are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, performance, and availability of our cloud infrastructure and applications. The ideal candidate will have strong experience in cloud platforms, Kubernetes, automation, monitoring, incident management, and DevOps practices.

Key Responsibilities

Maintain and improve the reliability, availability, and performance of production systems.
Design, implement, and manage monitoring, alerting, and observability solutions.
Manage and support Kubernetes clusters and containerized workloads.
Automate operational tasks using Infrastructure as Code (Terraform, ARM, Bicep, etc.).
Collaborate with development teams to improve application resilience and deployment processes.
Perform root cause analysis (RCA) for incidents and implement preventive measures.
Define and monitor SLIs, SLOs, and error budgets.
Manage CI/CD pipelines and deployment automation.
Support disaster recovery (DR), backup, and business continuity planning.
Participate in on -call support and incident response activities.
Optimize cloud infrastructure for performance, security, and cost efficiency.

Required Skills

Strong experience with Azure, AWS, or GCP.
Hands -on experience with Kubernetes (AKS/EKS/GKE).
Experience with Terraform, Infrastructure as Code, and automation.
Strong Linux and networking fundamentals.
Experience with GitLab CI/CD, Azure DevOps, or Jenkins.
Monitoring and observability tools such as Prometheus, Grafana, ELK, Datadog, Azure Monitor.
Scripting experience in Python, Bash, or PowerShell.
Knowledge of incident management, problem management, and change management processes.
Experience with databases, caching solutions, and messaging platforms is desirable.

Preferred Qualifications

Azure Administrator, Azure DevOps, Kubernetes (CKA/CKAD), or similar certifications.
Experience with microservices architecture and cloud -native technologies.
Understanding of security best practices and compliance requirements.

Nice to Have

Service Mesh (Istio/Kiali)
Kafka, Redis, MongoDB, PostgreSQL
Azure APIM, Application Gateway, WAF
Disaster Recovery and High Availability architecture

Key Metrics

Platform Availability (99.9%+)
MTTR (Mean Time to Recovery)
Incident Reduction
Deployment Success Rate
Infrastructure Automation Coverage


Requirements

Required Skills

Strong experience with Azure, AWS, or GCP.
Hands -on experience with Kubernetes (AKS/EKS/GKE).
Experience with Terraform, Infrastructure as Code, and automation.
Strong Linux and networking fundamentals.
Experience with GitLab CI/CD, Azure DevOps, or Jenkins.
Monitoring and observability tools such as Prometheus, Grafana, ELK, Datadog, Azure Monitor.
Scripting experience in Python, Bash, or PowerShell.
Knowledge of incident management, problem management, and change management processes.
Experience with databases, caching solutions, and messaging platforms is desirable.

Preferred Qualifications

Azure Administrator, Azure DevOps, Kubernetes (CKA/CKAD), or similar certifications.
Experience with microservices architecture and cloud -native technologies.
Understanding of security best practices and compliance requirements.

Nice to Have

Service Mesh (Istio/Kiali)
Kafka, Redis, MongoDB, PostgreSQL
Azure APIM, Application Gateway, WAF
Disaster Recovery and High Availability architecture

Key Metrics

Platform Availability (99.9%+)
MTTR (Mean Time to Recovery)
Incident Reduction
Deployment Success Rate
Infrastructure Automation Coverage


Original job Site Reliability Engineering (SRE) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Site Reliability Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Site Reliability Engineer Jobs in India

GrabJobs is the no1 job portal in India, connecting you to thousands of jobs fast! Find the best jobs in India, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.