Lead Engineer - Reliability Engineering

Company : Stone X

Job Type : Full Time

Bengaluru, India

Job Description - Lead Engineer - Reliability Engineering

Overview

Connecting clients to markets – and talent to opportunity.

With 5,400+ employees and over 80,000 institutional, commercial, and payments clients, we operate from more than 80 offices spread across six continents. As a Fortune 100, Nasdaq-listed provider, we connect clients to the global markets – focusing on innovation, human connection, and providing world-class products and services to all types of investors.

Whether you want to forge a career connecting our retail clients to potential trading opportunities, or ingrain yourself in the world of institutional investing, StoneX Group is made up of four business segments that offer endless potential for progression and growth.

Engage in a deep variety of business-critical activities that keep our company running efficiently. From strategic marketing and financial management to human resources and operational oversight, you’ll have the opportunity to optimize processes and implement game-changing policies.

As a Lead Engineer in Reliability Engineering, you will help define and drive the next stage of reliability maturity across our platforms and services. This is a senior hands-on engineering role for someone who has already spent several years building and operating Site Reliability Engineering practices in a large organization, and who understands what good looks like in production at scale.

You will partner closely with our Platform Engineering Observability team to improve reliability standards, operational practices, service ownership models, and engineering guardrails. Over time, you will help grow reliability capabilities across the wider engineering organization by mentoring engineers, shaping ways of working, and building practical and measurable reliability practices. The team is actively expanding end-to-end observability coverage across key business applications. This role will help ensure that telemetry, service health, reliability standards, and operational practices are implemented consistently and effectively as adoption grows.

This is an individual contributor role with no direct people management responsibilities. Success in this role will come through technical leadership, hands on engineering contribution, mentorship, and influence across teams.

Responsibilities

Define and drive reliability engineering standards, practices, and the enterprise reliability maturity model across platforms and services, including service tiering and adoption metrics
Partner with engineering, platform, infrastructure, and product teams to improve service reliability, resilience, operability, and supportability
Establish and mature reliability practices such as SLOs, SLIs, error budgets, alert quality, toil reduction, production readiness reviews, and service ownership expectations
Build and improve operational processes for change safety, release confidence, capacity planning, resilience testing, and disaster recovery across multi-cloud and hybrid environments
Use observability platforms such as Datadog or similar tools to improve visibility, actionable alerting, dashboards, and service health reporting
Drive end-to-end observability adoption for critical applications, ensuring consistent implementation of metrics, logs, traces, service maps, dashboards, and actionable alerting
Partner with application, platform, and infrastructure teams to improve instrumentation quality, service ownership, and operational readiness as key applications are onboarded into the observability ecosystem
Define and standardize observability architecture and telemetry standards, including service dependency mapping, service health indicators, alert quality, and operational response workflows
Drive automation of operational tasks and embed reliability guardrails into platform engineering workflows, including CI/CD pipelines and internal developer platforms
Apply observability standards to golden path templates, workflows, scorecards, and dashboards within the internal developer platform so operational best practices are embedded holistically throughout the SDLC
Identify reliability risks including architectural weaknesses, service fragility, and third-party or provider dependencies, and partner with teams to address them
Define and track meaningful reliability metrics and operational KPIs that help engineering teams improve service outcomes over time
Act as a senior hands-on engineer who guides technical direction while contributing directly to design, implementation, and operational improvement work
Mentor and coach engineers across the team, helping them develop stronger reliability and operational engineering skills

Qualifications

A track record of building, improving, or scaling reliability engineering or SRE practices in a large organization
7+ years of experience in SRE, production engineering, platform engineering, infrastructure engineering, or a closely related role
Several years of hands-on experience supporting production systems at scale, including incident response, problem management, availability improvement, and operational excellence
Strong practical experience defining and implementing SLOs, SLIs, error budgets, service health models, and reliability focused engineering practices
Strong experience with observability platforms such as Datadog or similar platforms, including metrics, logs, tracing, alerting, dashboards, and service level reporting
Experience driving or supporting end-to-end observability adoption across application teams, including instrumentation, telemetry standards, dashboards, alerting, and service level reporting
Experience driving improvements in incident management, post incident reviews, on call effectiveness, and operational maturity
Experience automating operational processes using tools such as Terraform, scripting languages, CI and CD pipelines, and cloud native platforms
Experience working with Kubernetes, Linux, Git, and modern cloud or platform infrastructure
Strong systems thinking and the ability to balance reliability, latency, engineering velocity, risk, and cost
Strong communication, collaboration, and influencing skills, with the ability to work across multiple teams and levels of seniority
Demonstrated ability to mentor engineers and help raise the reliability maturity of a broader team
A practical mindset, someone who can define strong engineering practices and also contribute directly in a hands-on way

What makes you stand out:

You have helped build or formalize reliability engineering or SRE practices in a complex organization
You know what good looks like for service ownership, production readiness, alerting quality, incident response, and operational accountability
You have helped onboard critical applications into an end-to-end observability model, improving visibility across metrics, logs, traces, service dependencies, and operational response
You have successfully reduced toil, improved service reliability, and created measurable operational improvements across teams
You are able to influence engineering culture, not just tooling or process
You have coached less experienced engineers and helped teams grow into stronger operational ownership
You are comfortable introducing structure and standards without creating unnecessary bureaucracy
You can work across observability, platform engineering, and application teams to create practical, adoptable reliability practices

Education / Certification Requirements:

Bachelor’s degree in computer science, engineering, or a related field, or equivalent practical experience
Relevant certifications are a plus, but practical experience building and operating reliable systems at scale is more important
Commitment to continual professional and technical development

Working environment:

Hybrid, four days in the office.
Occasional Travel Requirements, for team collaboration meetings and conferences.

#LI-Hybrid

Original job Lead Engineer - Reliability Engineering posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Share Job

Get your Resume Reviewed for Free

Similar Lead Engineer - Reliability Engineering Jobs in India

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip