Observability Engineering Lead, Prometheus, Grafana, 2 days onsite.
The role:
We're looking for a highly skilled Observability Engineering Lead to drive the uplift, resilience, and effectiveness of our monitoring ecosystem. You'll play a pivotal role in shaping how we detect, diagnose, and prevent issues across our critical applications-partnering with engineering teams to deliver world‑class insights through metrics, dashboards, alerts, and automation.
This is a hands‑on technical leadership position where you'll influence standards, modernise tooling, and enhance our visibility across complex distributed systems.
Collaborate with Application Stewards and SREs to validate critical assets in scope for monitoring verification and uplift.
Work with EMAS to analyse Prometheus scrape coverage, exporter deployment, and Grafana dashboard availability for critical services.
Identify and implement improvements across monitoring configurations, alert quality, data models, dashboards, KPIs, SLIs, and SLOs.
Review roles and responsibilities across observability functions and recommend enhancements aligned to Operational Resilience standards.
Contribute to delivering automated, end‑to‑end business flow visibility, surfaced in Grafana through service maps, dependency visualisation, or topology integrations.
Ensure alerting configurations are reliable, actionable, and noise‑optimised, following Alertmanager best practices.
Skill required:
Deep expertise in designing, implementing, and configuring modern observability stacks-specifically Prometheus, Grafana, and associated tooling.
Prometheus
Strong instrumentation strategy (exporters, service discovery, custom metrics).
Advanced PromQL skills for complex querying and performance analysis.
Experience building recording/alerting rules and optimising metric ingestion.
Knowledge of HA architectures, federation, sharding, and long‑term storage (Thanos, Cortex, Mimir).
Grafana
Dashboard and panel design focused on performance and operator clarity.
Best‑practice alert configuration and routing.
Experience with synthetic monitoring (Grafana Synthetic Monitoring, Blackbox exporter).
Log ingestion/analysis (Loki).
Familiarity with Real User Monitoring tooling (e.g., Grafana Faro).
Ecosystem & Integrations
Strong API and automation skills for dashboard provisioning, alert management, and data ingestion.
Experience integrating the Grafana/Prometheus ecosystem with logging, tracing, and event platforms (Loki, Tempo, OpenTelemetry).
Observability Engineering Lead, Prometheus, Grafana, 2 days onsite.
McGregor Boyall is an equal opportunity employer and do not discriminate on any grounds.
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Contract Jobs in the UK.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in the UK, connecting you to thousands of jobs fast!
Find the best jobs in the UK, apply in 1 click and get a job today!