Logo-of-Klaviyo-hiring-for-jobs-in-Ireland-on-GrabJobs

Senior SRE, Site Reliability Engineer

icon building Company : Klaviyo
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
icon loader

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Senior SRE, Site Reliability Engineer

Senior Site Reliability Engineer – Site Reliability Engineering (Dublin)


Team Overview


As a senior Site Reliability engineer, you’ll ensure Klaviyo’s critical platforms are reliable, scalable, and sustainable while enabling rapid product development. We treat reliability as a core product feature and use software engineering to solve complex systems and operational challenges.


Our work spans security, infrastructure, and software development, requiring us to understand systems and engineering.  We build complex, foundational solutions that must be extremely reliable, secure, and performant at global scale.


Our charter is to build and operate foundational services and infrastructure, define clear reliability objectives, reduce operational toil through automation, and continuously improve systems based on real production learnings. The work is highly visible and directly impacts how Klaviyos build software and how customers experience Klaviyo every day.


How you’ll make an impact


As a Senior Site Reliability Engineer, you will build and operate the platforms, systems, and services that underpin Klaviyo’s reliability and operational excellence. You will:



  • Build and operate foundational, security-critical services with a strong emphasis on availability, scalability, latency, and fault tolerance

  • Apply software engineering principles to automate infrastructure, reduce operational toil, and improve system reliability at scale

  • Design, implement, and evolve systems using SRE best practices

  • Define and refine SLIs, SLOs, and error budgets to guide engineering decisions

  • Improve observability, alerting, and incident response to reduce mean time to detection and recovery

  • Participate in on-call rotations with a focus on sustainable operations and automatic remediations 

  • Perform quantitative analysis to understand system behavior, capacity constraints, and scaling limits

  • Identify systemic risks and reliability bottlenecks and drive long-term, preventative solutions

  • Collaborate closely with product, platform, and security engineers to influence architecture early and ship reliable systems

  • Mentor and pair with other engineers, helping raise the bar for reliability, operational maturity, and engineering excellence


Who you are


You are a cloud-native, platform-focused SRE who uses software to build and operate reliable production systems at scale.



  • You write and maintain production-quality code (e.g. Python, Go, or similar) to build internal platforms, automate operations, and improve system reliability

  • You have built, deployed, and operated distributed, cloud-native systems and understand failure modes such as partial outages, dependency failures, resource saturation, and cascading impact

  • You have experience operating containerized workloads and platforms (e.g. Kubernetes) in production, including deployment strategies, scaling behavior, and service networking

  • You are comfortable participating in on-call rotations and diagnosing production issues

  • You have designed and operated observability systems and know how to build actionable alerts that reflect real user and service impact

  • You apply SRE concepts such as SLIs, SLOs, error budgets, and burn-rate–based alerting to guide engineering decisions and operational response

  • You have hands-on experience with infrastructure as code and declarative configuration (e.g. Terraform, Kubernetes manifests, policy-as-code)

  • You have performed capacity planning, load testing, and performance analysis for distributed services and platforms

  • You routinely contribute to post-incident reviews and drive concrete, code-focused follow-up actions that prevent recurrence

  • You are comfortable reviewing and contributing to technical designs, platform APIs, operational runbooks, and system documentation

  • You’ve already experimented with AI in work or personal projects, and you’re excited to dive in and learn fast. You’re hungry to responsibly explore new AI tools and workflows, finding ways to make your work smarter and more efficient.



Nice to have



  • Experience supporting security-critical platforms or building internal security tooling

  • Familiarity with identity, access management, secrets management, or policy enforcement systems

  • Experience operating systems at scale in cloud environments (AWS preferred)

  • Background in resilience testing, fault injection, or chaos engineering

  • A strong comprehension of algorithms and data structures at scale


Tech Stack


Klaviyo’s platform is primarily built with Python and React and runs on AWS. Engineers join us from a wide range of technical backgrounds and are supported in learning our stack.


Core technologies include:



  • Python / Django / FastAPI

  • MySQL / Redis / Memcached

  • RabbitMQ / Celery / Apache Kafka / Apache Pulsar

  • AWS / Terraform / Kubernetes



Location & Work Model


This role is based in Dublin, Ireland and follows a hybrid working model. Klaviyo supports work authorization and relocation for this position.


At Klaviyo, we enjoy tackling meaningful engineering challenges and value people who take ownership, learn continuously, and collaborate openly. We are committed to building inclusive teams and encourage applications from candidates of all backgrounds.


Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at https://klaviyo.tech


 

Original job Senior SRE, Site Reliability Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Share Job
Share Job

About the Company

Klaviyo

Klaviyo, the CRM for consumer brands combining email marketing and SMS with an embedded CDP to unify data for personalized, scalable customer engagement.

Read more about the company

Auto-Apply to Senior SRE Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Senior SRE Jobs in Ireland

GrabJobs is the no1 job portal in Ireland, connecting you to thousands of jobs fast! Find the best jobs in Ireland, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.