Senior Manager of Engineering, Production Infrastructure

Company : Klaviyo

Job Type : Full Time

Boston, Ma

Number of Applicants

000+

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Senior Manager of Engineering, Production Infrastructure

Klaviyo powers growth for thousands of businesses, and our R&D teams build on shared platform primitives. As the Senior Manager, Production Infrastructure, you’ll lead the teams behind our paved roads—compute runtimes, service networking/ingress, and observability—so product engineers can move fast on a stable, cost‑disciplined foundation. You’ll publish opinionated defaults ("golden paths"), install SLO discipline, and make reliability and developer experience measurable across the company.

This is a hands‑on leadership role: you’ll stay close to architecture and operations, review designs and PRs, jump into incidents when needed, and prototype reference solutions that set the standard.

How You’ll Make a Difference

Own and evolve platform primitives in scope (compute runtimes, service networking/ingress, observability) with clear APIs, SLOs, runbooks, and support tiers.

Lead by example technically: drive design reviews, review PRs, and author reference implementations, starter repos, and Terraform/Helm modules that demonstrate the golden path.

Deliver golden paths and self‑service scaffolding; reduce time‑to‑first‑service and lead time for changes.

Raise the bar on reliability: incident response (blameless), alert hygiene, capacity planning, and on‑call health.

Be production‑close: participate in critical incident response and postmortems; trace issues across Kubernetes, service mesh, and data paths; convert learnings into durable fixes, guardrails, and policy‑as‑code.

Standardize observability end‑to‑end: expand OpenTelemetry adoption, define log/trace schemas, and make SLOs and error budgets first‑class in dashboards and alerts.

Evolve our Kubernetes and networking layers: plan cluster upgrades, right‑size node/Pod configs, harden ingress/gateway policies, and advance mTLS/service identity and traffic shaping.

Advance CI/CD and GitOps: ensure fast, safe deploys with progressive delivery, automatic rollbacks, and pre‑prod environments that mirror prod; enforce guardrails via policy‑as‑code.

Stand up a concise scorecard (SLO coverage, incident frequency/severity, lead time, MTTR, developer platform NPS, cost‑to‑serve) and drive consistent trend improvements.

Partner with Security, Data Platform, and Product to clarify ownership boundaries and enable safe, fast delivery.

Improve cost‑to‑serve via quotas, right‑sizing, and showback in partnership with Finance.

Transform workflows by putting AI at the center, building smarter systems and ways of working from the ground up; pilot AI‑assisted runbooks and incident summarization to shorten resolution time.

Who You Are

7–10+ years in infra/SRE/platform with 3–5+ years leading teams (including managers or staff/lead ICs).

Demonstrated SRE practices (SLI/SLO design, incident mgmt, capacity planning) and experience with Kubernetes/container orchestration, service networking, IaC, and modern observability.

Technically credible and hands‑on: comfortable reading and discussing code (e.g., Go, Python, or Java), reviewing PRs, and writing small prototypes/tooling when it accelerates the team.

Fluent with Kubernetes internals (scheduling, autoscaling, resource management) and service networking (e.g., Envoy/Istio/Linkerd, API gateways).

Operate the full observability stack (metrics, logs, traces, profiling) and instrument SLIs/SLOs using OpenTelemetry‑friendly patterns.

Automate by default: Terraform (or Pulumi), Helm/Kustomize, GitOps, CI/CD; you prefer guardrails and policy‑as‑code over manual gates.

You write crisp docs/diagrams and define platform contracts that hold up under scale.

You drive measurable developer velocity and reliability improvements and communicate progress with clarity.

You build inclusive, high‑trust teams and partner tightly across Security/Product/Finance.

You’ve already experimented with AI in work or personal projects and are eager to deepen your fluency responsibly.

Nice to Haves

Platforms "as a product" (DX metrics, roadmaps), event‑driven architectures, and cost‑to‑serve optimization in high‑growth SaaS.

Experience contributing to platform code or tooling (e.g., base images, CLI/scaffolding, controllers/operators, admission/policy), multi‑cluster or multi‑region operations, and progressive delivery.

We use Covey as part of our hiring and / or promotional process. For jobs or candidates in NYC, certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April 3, 2025.

Please see the independent bias audit report covering our use of Covey here

Original job Senior Manager of Engineering, Production Infrastructure posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Senior Manager of Engineering Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Senior Manager of Engineering Jobs in the US

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip