Staff SRE - Observability

Salary :

£3 - 7 hourly

Company : Focused Labs

Job Type : Full Time

London, England

Job Description - Staff SRE - Observability

Who we are:

At Focused, we move quickly to deliver quality software that achieves client outcomes and meets their customer’s needs. We strategically partner with our clients to leverage our expertise in design and software, while our clients bring their own domain expertise. We work with a variety of clients from different industries, collaborating as we get new products to market, modernizing legacy systems, or helping teams learn the skills they need to be successful.

Our values:

Listen first • We are experts in product practices but life long learners in the domain of our customers. We research, collaborate, and understand.

Learn why • We ask questions and talk to users to understand problem spaces, objectives, and goals, which allows us to deeply invest and drive towards the outcomes of our clients.

Love your craft • We love diving into a variety of domains and solving problems. We take pride in delivering value, in communicating progress, and guiding our clients to success.

We are seeking an experienced Staff Observability Consultant with deep expertise in OpenTelemetry, leading clients and teams, and strong Platform Engineering capabilities to help organizations implement, optimize, and scale their observability infrastructure. This role requires a seasoned consultant who can design comprehensive telemetry strategies, implement distributed tracing solutions, establish robust monitoring practices, and interface closely with clients on the observability journey.

Key Responsibilities:

OpenTelemetry & Observability

Design and implement end-to-end OpenTelemetry solutions across diverse technology stacks

Configure and deploy OpenTelemetry Collectors for efficient data collection, processing, sampling, and routing

Establish telemetry pipelines for metrics, traces, and logs across microservices architectures

Optimize collector configurations for performance, reliability, and cost-effectiveness

Platform Engineering & Infrastructure

Augment existing infrastructure with with integrated observability solutions

Implement Infrastructure as Code (IaC) solutions using Terraform, Pulumi, CloudFormation, etc.

Architect and manage Kubernetes clusters with comprehensive monitoring and logging

Build CI/CD pipelines with embedded observability and automated testing

Site Reliability Engineering (SRE)

Establish and maintain Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs)

Implement error budgets, toil reduction strategies, and capacity planning

Support incident response procedures and post-mortem processes

Cloud & DevOps Engineering

Deploy and manage observability infrastructure across AWS, GCP, and Azure

Establish security, compliance, and governance frameworks for telemetry data

Experience automating Agent Evaluations in CI/CD pipelines and observability backends.

Required Qualifications:

Core Observability & OpenTelemetry

3-7 years of experience in observability, monitoring, and distributed systems

Deep hands-on experience with OpenTelemetry ecosystem, including SDKs, APIs, and specifications

Proficiency with OpenTelemetry Collector configuration, processors, exporters, and receivers

Strong understanding of telemetry data models, semantic conventions, and instrumentation best practices

Platform Engineering & DevOps

7+ years of Platform Engineering or DevOps experience with focus on site reliability, observability, and incident response

Proficiency with Infrastructure as Code tools (Terraform, Pulumi, CloudFormation, CDK)

Strong experience with CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, ArgoCD)

Cloud & Infrastructure

Hands-on experience with major cloud providers (AWS, GCP, Azure) and their observability services

Experience with container technologies (Docker, Podman) and container registries

Knowledge of networking, security, load balancing, and distributed systems concepts

Site Reliability Engineering

Experience implementing SRE practices including error budgets and toil metrics

Proficiency in incident management, on-call procedures, and post-mortem culture

Experience with capacity planning, performance optimization, and scalability design

Programming & Automation

Proficiency in multiple programming languages preferred (Go, Python, Java, Node.js, Rust)

Strong scripting and automation skills (Bash, Python, PowerShell)

Understanding of software engineering best practices and testing methodologies

Preferred Qualifications (Exceptional Candidates)

AI & Agentic Frameworks

Understanding of Large Language Models (LLMs) and their application in DevOps

Knowledge of vector databases, embeddings, and retrieval-augmented generation (RAG)

Experience with AI/ML model deployment and monitoring in production environments

Leadership & Communication

Experience leading teams, managing client relationships and expectations

Strong technical writing and documentation skills

Ability to present complex technical concepts to diverse stakeholders

A passion for knowledge sharing

Key Competencies

Systems thinking and ability to design holistic observability solutions

Strong analytical and troubleshooting skills for complex distributed systems

Curiosity about emerging technologies, particularly AI applications in operations

Adaptability to rapidly evolving cloud-native and observability technologies

Collaborative mindset with focus on enabling developer productivity and system reliability

What Sets Exceptional Candidates Apart:

Experience with Honeycomb

Contributions to open-source observability or AI framework projects

Track record of implementing platform engineering solutions that significantly improved developer experience

Experience scaling observability infrastructure to handle high event volume

What to know before you apply:

You will be expected to work for up to four days a week in person, be it from our office in London or from client sites.

The London base salary range for this role is £95,000 - £130,000 GBP.

Original job Staff SRE - Observability posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Share Job

Get your Resume Reviewed for Free

Similar Staff SRE Jobs in the UK

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Staff SRE - Observability

Job Description - Staff SRE - Observability

Similar Staff SRE Jobs in the UK

Mobile Apps