Job Description - Senior Site Reliability Engineer
Description
Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world.
Orgvue is used by the world’s largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.
We are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure.
Role
In this role you will work across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale.
This role combines hands-on technical capability with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We're looking for someone who has technical expertise, is a great communicator and enjoys collaborating across multiple teams.
Responsibilities
Define and enforce SLOs, SLIs, and error budgets across critical services
Crafting and implementing a cloud infrastructure and tooling strategy
Work across our Org to level up SRE practices
Help implement robust observability metrics, logs & traces using our observability tool
Guide the team in building automated, self-healing systems
Own and evolve our incident response processes, including on-call practices and post-mortem culture
Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform
Requirements
Demonstrable experience leading SRE transformations
Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in the UK.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in the UK, connecting you to thousands of jobs fast!
Find the best jobs in the UK, apply in 1 click and get a job today!