Site Reliability Engineer (SRE) / Observability Engineer

Company : N Human Resources

Job Type : Full Time

Hyderabad, India

Number of Applicants

:

000+

Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Site Reliability Engineer (SRE) / Observability Engineer

About the opportunity

We are hiring on behalf of a well -established global IT consulting and implementation firm with offices across North America, Europe, and India (HITEC City, Hyderabad). The organisation delivers technology solutions across Cloud, DevOps, SAP, and AI for enterprise clients globally and has a strong people -first, learning -oriented culture.

Role overview

We are looking for a Site Reliability Engineer with a strong Observability specialisation to drive service reliability, reduce operational toil, and build best -in -class monitoring and alerting infrastructure. The ideal candidate brings deep Grafana expertise and will take ownership of SLO/SLA definition, distributed system visibility, and driving the shift from reactive to proactive operations.

Key responsibilities

• Define, track, and report on Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets across platform services

• Build, maintain, and optimise observability infrastructure using Grafana, Prometheus, Loki, Tempo, and related open -source tooling

• Develop dashboards and alerting rules that provide actionable, low -noise insights for engineering and operations teams

• Lead blameless post -incident reviews (PIRs) and drive systemic reliability improvements from learnings

• Partner with engineering teams to instrument applications with distributed tracing, structured logging, and custom metrics

• Reduce operational toil through automation — scripting runbooks, auto -remediation workflows, and self -healing infrastructure

• Define on -call practices, escalation policies, and runbooks; contribute to a sustainable on -call culture

• Evaluate and implement new observability tooling as the stack evolves (e.g., OpenTelemetry, Jaeger, VictoriaMetrics)

Required skills & experience

• 8+ years of combined SRE / DevOps / Platform Engineering experience

• Strong hands -on expertise with Grafana — dashboards, alerting, data sources

• Proficiency in Prometheus — PromQL, exporters, alertmanager

• Experience with log aggregation using Loki, ELK stack, or equivalent

• Solid understanding of distributed systems principles, microservices architecture, and container orchestration (Kubernetes)

• Proficiency in Python, Go, or Bash for automation and tooling

• Strong analytical thinking for root cause analysis and capacity planning

Good to have

• Hands -on experience with OpenTelemetry instrumentation

• Exposure to Grafana OnCall, Grafana Incident, or PagerDuty for incident management

• Familiarity with eBPF -based observability tools (Cilium, Parca)

• Azure or AWS certifications

What's on offer

• End -to -end ownership of observability — not just maintaining dashboards

• Hybrid work flexibility from HITEC City, Hyderabad

• Exposure to global -scale distributed systems for international clients

• Certification reimbursement and structured learning pathways

Location: Hyderabad (Hybrid)

Experience: 8+ years

Employment type: Full -time

Specialisation: Observability – Grafana, Prometheus, Loki stack

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.