Job Description - Principal Site Reliability Engineer
Own Reliability at Scale Lead design, implementation, and evolution of reliability, availability, and resiliency strategies for large‑scale distributed systems written primarily in Java Apply deep experience operating complex, distributed systems to guide architectural decisions, reliability strategies, and long‑term system evolution Identify systemic risks in application architecture, data flows, and infrastructure, and drive architectural improvements that measurably improve availability, performance, and scalability Set and evolve reliability standards, best practices, and operational principles across R&D Apply advanced software engineering practices to eliminate manual work, reduce operational load, and improve system observability Design and build internal platforms, automation, and tooling that support Java‑based services and their operational needs Contribute to longer‑term reliability and infrastructure strategy aligned with business growth US Citizenship or Green Card holder only for this role due to ITAR requirements. Ability to commute to the Seaport Boston office 2-3 days a week. 7+ years of experience in software engineering, site reliability engineering, or systems engineering roles Extremely strong proficiency with the Java programming language and its ecosystem, including building, debugging, and operating production Java services Deep experience operating complex, distributed systems in production environments Strong software engineering background, with a track record of delivering high‑quality, maintainable code Ability to reason about failure modes across application, data, and infrastructure layers Demonstrated ability to lead complex initiatives that span teams and organizational boundaries Comfortable making high‑impact technical decisions in ambiguous environments Strong communicator who can influence design and operational decisions across a wide range of stakeholders Experience operating or supporting systems using technologies such as MongoDB, ZooKeeper, and RabbitMQ Background in performance tuning and scalability optimization of Java services Experience setting or influencing engineering standards at the organization level Prior involvement in evolving SRE or platform practices in a growing engineering organization Experience designing, operating, or scaling systems in cloud environments such as AWS (preferred), including familiarity with core services, networking models, and reliability features
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in the US.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast!
Find the best jobs in the US, apply in 1 click and get a job today!