Site Reliability Engineer (SRE)

Salary :

$142,000 - 214,700 yearly

Company : Monstro Pc

Job Type : Full Time

New York, United States

Number of Applicants

000+

Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Site Reliability Engineer (SRE)

About Monstro

Monstro is the operating system for governed financial intelligence. We build governance and intelligence infrastructure that enables artificial intelligence to operate safely, explainably, and at institutional scale.

We exist because the level of financial guidance historically available to a small group should be accessible to many more people. By combining AI with deep institutional infrastructure, we help financial institutions deliver more personalized, responsible, and life-changing financial support to millions of individuals.

We’re building mission-critical systems in a highly regulated domain, and we care deeply about doing it right. If you’re motivated by meaningful problems, high standards, and shaping infrastructure that improves financial outcomes, you’ll feel at home here.

About the Role

Monstro is building a secure, multi-tenant platform on Google Cloud, and we’re hiring a Site Reliability Engineer to own the reliability and observability of that platform end-to-end.

This is a hands-on role for someone who wants to do real SRE work - not a rebrand of L1 support. You’ll write the dashboards, define the SLOs, build the automation that kills toil, and take your turn on the on-call rotation that proves it all works. When something breaks at 2 AM, you’re the person who keeps it running; when nothing’s breaking, you’re the person making sure the next break is smaller, shorter, or doesn’t happen at all

What You’ll Do

Observability and reliability engineering

Define and maintain SLOs and SLIs for our tier-1 services: API gateway, application services, identity, and edge availability

Build canonical dashboards and alerts in Google Cloud Monitoring, backed by structured logs and BigQuery log analytics

Tune alert routing so every page is actionable — kill the rest

Instrument services for distributed tracing and structured logging; push back on services that ship without it

Own error budgets and use them to prioritize reliability work over feature work when burned

Reduce toil: automate the top recurring page from the previous quarter

Maintain runbooks so every page maps to one within a cycle of first occurrence

On-call rotation and incident response

First responder for production alerts across monitoring, API gateway, edge defense, and CI

Triage severity, run the incident bridge, drive mitigation (revision rollback, traffic shift, scaling, edge block, credential rotation)

Own internal and external incident comms during your shift

Drive postmortems to closure with action items tracked as audit evidence

Clean written handoffs at end of shift

Our stack

Google Cloud Platform across multiple environments

Apigee X for API management

Cloud Run, GKE Autopilot, Cloud SQL

Identity Platform for customer identity

Cloud Armor, Cloud IDS, Security Command Center for edge and posture

BigQuery-backed log analytics from an org-level log sink

OpenTofu / Terraform for everything; GitHub Actions for CI/CD

Linear for work tracking

What You Bring

Required:

Solid production experience on GCP (or comparable AWS/Azure depth with willingness to ramp on GCP fast)

Comfortable on-call: you’ve run incidents, written postmortems, and shipped the action items

Strong observability fundamentals: SLOs, log-based metrics, alert hygiene, dashboard discipline

Working knowledge of Kubernetes, API gateways, identity systems, and at least one IaC tool

Scripting / coding fluency (Python, Go, Bash) for automation and tooling

Good written communication — handoffs, postmortems, and runbooks are part of the job

Bias toward fixing the system, not the symptoms

Nice to Have:

Apigee or another enterprise API gateway in production

BigQuery for log analytics or audit

Experience standing up observability from scratch, not just maintaining inherited dashboards

SOC2 or similar compliance environments

Why Monstro?

Ownership & Impact: Shape the future of AI-powered finance—building a category-defining product used by consumers and institutions around the world.

Experienced Team: Join a team with leadership that has a track record of scaling companies from early stage to major exits.

Principles-Driven Culture: Work in a culture that values speed, ownership, and impact—what most companies achieve in 90 days, we do in 45.

Comprehensive Compensation Package: Competitive salary, equity, and robust benefits package, including paid health, vision, dental, and disability coverage.
- Compensation Range (New York City):
- Compensation Range (Denver Metro):

*The posted range reflects the base salary for this role across the market ranges for each location. Final compensation will depend on a variety of factors, including experience, skills, internal leveling, and market conditions, and will be offered within the stated range in accordance with applicable pay transparency laws.

Base Compensation Range (New York City): $142,000 - $214,700

A Note on Interviewing: We sometimes use AI note-takers to help us transcribe interview notes, so we can be more present in your interview. If you’d like to opt out of us using automatic transcribers, please note this in the free text field in your application, otherwise we’ll take your application as confirmation that you’re happy for us to use notetakers (whether added to video calls or in the background).

**Please note: This role will have a start date of end of July**

Ready to Build With Us?

If you’re excited to contribute to a high-bar team building something meaningful, we love to hear from you!

Original job Site Reliability Engineer (SRE) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Apply Now

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Site Reliability Engineer Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Site Reliability Engineer Jobs in the US

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.