Site Reliability Engineering (SRE)

Company : Flydocs

Job Type : Full Time

New Delhi, India

Number of Applicants

:

000+

Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Site Reliability Engineering (SRE)

Site Reliability Engineer (SRE) – Job Description

Job Title: Site Reliability Engineer (SRE)
Experience: 4–8 Years
Location: Delhi NCR
Employment Type: Full -Time

Job Summary

We are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, performance, and availability of our cloud infrastructure and applications. The ideal candidate will have strong experience in cloud platforms, Kubernetes, automation, monitoring, incident management, and DevOps practices.

Key Responsibilities

Maintain and improve the reliability, availability, and performance of production systems.
Design, implement, and manage monitoring, alerting, and observability solutions.
Manage and support Kubernetes clusters and containerized workloads.
Automate operational tasks using Infrastructure as Code (Terraform, ARM, Bicep, etc.).
Collaborate with development teams to improve application resilience and deployment processes.
Perform root cause analysis (RCA) for incidents and implement preventive measures.
Define and monitor SLIs, SLOs, and error budgets.
Manage CI/CD pipelines and deployment automation.
Support disaster recovery (DR), backup, and business continuity planning.
Participate in on -call support and incident response activities.
Optimize cloud infrastructure for performance, security, and cost efficiency.

Required Skills

Strong experience with Azure, AWS, or GCP.
Hands -on experience with Kubernetes (AKS/EKS/GKE).
Experience with Terraform, Infrastructure as Code, and automation.
Strong Linux and networking fundamentals.
Experience with GitLab CI/CD, Azure DevOps, or Jenkins.
Monitoring and observability tools such as Prometheus, Grafana, ELK, Datadog, Azure Monitor.
Scripting experience in Python, Bash, or PowerShell.
Knowledge of incident management, problem management, and change management processes.
Experience with databases, caching solutions, and messaging platforms is desirable.

Preferred Qualifications

Azure Administrator, Azure DevOps, Kubernetes (CKA/CKAD), or similar certifications.
Experience with microservices architecture and cloud -native technologies.
Understanding of security best practices and compliance requirements.

Nice to Have

Service Mesh (Istio/Kiali)
Kafka, Redis, MongoDB, PostgreSQL
Azure APIM, Application Gateway, WAF
Disaster Recovery and High Availability architecture

Key Metrics

Platform Availability (99.9%+)
MTTR (Mean Time to Recovery)
Incident Reduction
Deployment Success Rate
Infrastructure Automation Coverage

Requirements

Required Skills

Strong experience with Azure, AWS, or GCP.
Hands -on experience with Kubernetes (AKS/EKS/GKE).
Experience with Terraform, Infrastructure as Code, and automation.
Strong Linux and networking fundamentals.
Experience with GitLab CI/CD, Azure DevOps, or Jenkins.
Monitoring and observability tools such as Prometheus, Grafana, ELK, Datadog, Azure Monitor.
Scripting experience in Python, Bash, or PowerShell.
Knowledge of incident management, problem management, and change management processes.
Experience with databases, caching solutions, and messaging platforms is desirable.

Preferred Qualifications

Azure Administrator, Azure DevOps, Kubernetes (CKA/CKAD), or similar certifications.
Experience with microservices architecture and cloud -native technologies.
Understanding of security best practices and compliance requirements.

Nice to Have

Service Mesh (Istio/Kiali)
Kafka, Redis, MongoDB, PostgreSQL
Azure APIM, Application Gateway, WAF
Disaster Recovery and High Availability architecture

Key Metrics

Platform Availability (99.9%+)
MTTR (Mean Time to Recovery)
Incident Reduction
Deployment Success Rate
Infrastructure Automation Coverage

Original job Site Reliability Engineering (SRE) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Apply Now

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Site Reliability Engineer Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Site Reliability Engineer Jobs in India

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.