Site Reliability Engineer (SRE) | AWS | Kubernetes
Fully Remote (UK) 24/7 Shift Pattern (28-day rota including days & nights) £ Competitive + Bonus + Excellent Benefits
Build resilient cloud platforms that support critical national services.
We're recruiting Site Reliability Engineers to join a global leader in AI-powered customer experience and cloud technology. Following the award of a major government programme, they're expanding their engineering teams to build and support highly secure, cloud-native platforms that deliver sensitive communication services.
This is an opportunity to join an organisation investing heavily in modern cloud engineering, automation and reliability. Working as part of a collaborative SRE team, you'll help ensure large-scale production environments remain secure, available and resilient, whilst continuously improving the way they're operated through automation and engineering best practice.
If you enjoy solving production challenges, improving reliability and automating away operational toil, we'd love to hear from you.
What you'll be doing
Monitoring and maintaining highly available production platforms running in AWS Responding to and managing production incidents across a 24/7 service Investigating complex technical issues and restoring services quickly and effectively Developing automation to reduce manual operational tasks and improve platform resilience Building and improving monitoring, alerting and observability across cloud environments Working alongside Software, Platform, Cloud and Security Engineers to improve reliability and operational excellence Contributing to post-incident reviews and driving continuous service improvements Supporting containerised workloads using Kubernetes and Docker
What we're looking for
You'll ideally have experience in a Site Reliability Engineering, Production Engineering, Cloud Operations or NOC environment with exposure to:
Linux systems administration AWS cloud infrastructure Kubernetes and Docker Production support and incident management Python, Bash or Go scripting Monitoring and observability platforms such as Grafana, Prometheus, Datadog, Splunk or CloudWatch Networking fundamentals including DNS, TCP/IP and load balancing A passion for automation, continuous improvement and operational excellenceExperience with Infrastructure as Code (Terraform), SRE principles (SLIs, SLOs), or regulated environments would be beneficial but isn't essential.
Why join?
This is far more than a traditional NOC role.
You'll be joining an engineering-led organisation where reliability, automation and continuous improvement sit at the heart of the platform. Rather than simply responding to incidents, you'll work to prevent them by improving systems, automating operational processes and helping shape the future of highly resilient cloud services.
If you're passionate about building reliable cloud platforms and enjoy solving complex technical problems in large-scale production environments, we'd love to hear from you.
Apply today or contact Dave Carlisle at Spectrum IT Recruitment for a confidential discussion.
Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy Only candidates based in UK and eligible to work in UK are allowed
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in the UK.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in the UK, connecting you to thousands of jobs fast!
Find the best jobs in the UK, apply in 1 click and get a job today!