Site Reliability Engineer (Infrastructure Applications) - Director P3 - ETS

Company : Morgan Stanley

Job Description - Site Reliability Engineer (Infrastructure Applications) - Director P3 - ETS

About Morgan Stanley

Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, investment management and wealth management services. The Firm's employees serve clients worldwide including corporations, governments, and individuals from more than 1,200 offices in 43 countries.

As a market leader, the talent and passion of our people is critical to our success. Together, we share a common set of values rooted in integrity, excellence, and strong team ethic. Morgan Stanley can provide a superior foundation for building a professional career - a place for people to learn, to achieve and grow. A philosophy that balances personal lifestyles, perspectives and needs is an important part of our culture.

Overview
Join Morgan Stanley's Application Services Infrastructure team to keep a set of business-critical infrastructure applications reliable for technologists across the firm. Our platforms help teams schedule, coordinate, and monitor their production workloads.

You'll combine deep Linux troubleshooting with automation and reliability engineering: improving monitoring, reducing toil, leading upgrades, and driving root-cause fixes that prevent repeat incidents.

What you'll do
- Own production reliability for multiple infrastructure applications: incident response, triage, and sustained follow-through to resolution.
- Drive stability work: improve alerting quality, monitoring coverage, and operational tooling to reduce noise and speed recovery.
- Lead or execute production changes (upgrades, hygiene fixes, reconfiguration) with strong change-management and rollback planning.
- Perform in-depth RCAs and prevent recurrence of incidents and escalations through long-term fixes, automation, and better runbooks
- Build self-service workflows and high-quality documentation to improve user experience and reduce time-to-production.
- Partner with product engineers and infrastructure teams to identify systemic issues and deliver cross-team solutions.

On-call & schedule
- After onboarding, you'll join a rotating on-call roster with periodic weekend coverage (~1 weekend/month).
- L3 support focuses on high-impact incidents where documentation is incomplete-success requires calm, structured troubleshooting in distributed systems.
- Occasional off-hours work may be needed for planned changes and incident follow-up (we aim to minimize this through automation and process).

Required experience
- At least 7 years of experience in production support / reliability experience for applications on Linux/UNIX.
- Strong command-line troubleshooting skills: logs, processes, networking, and dependency health in distributed systems.
- Ability to write production-ready automation in bash/shell plus one language (Python preferred; Go/Ruby/Perl/C/others welcome).
- Strong written communication for technical documentation and incident/RCA write-ups.
- Working understanding of distributed architecture (load balancers, app servers, databases, messaging).
- AI-assisted development and operational automation.

Preferred experience
- Cloud-native deployment/support and/or containers (Docker/podman).
- Observability tooling (Grafana, Splunk, or similar), log forwarding/agents, and alert tuning.
- Linux administration and performance troubleshooting.
- Any database experience (SQL/NoSQL).
- Experience with workflow/scheduling platforms (Autosys, Apache Airflow) or coordination systems (Apache Zookeeper).

WHAT YOU CAN EXPECT FROM MORGAN STANLEY:

At Morgan Stanley, we raise, manage and allocate capital for our clients - helping them reach their goals. We do it in a way that's differentiated - and we've done that for 90 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren't just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you'll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There's also ample opportunity to move about the business for those who show passion and grit in their work.

To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices into your browser.

Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.

Original job Site Reliability Engineer (Infrastructure Applications) - Director P3 - ETS posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Share Job

Get your Resume Reviewed for Free

Similar Site Reliability Engineer Jobs in Hong Kong

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Site Reliability Engineer (Infrastructure Applications) - Director P3 - ETS

Job Description - Site Reliability Engineer (Infrastructure Applications) - Director P3 - ETS

Similar Site Reliability Engineer Jobs in Hong Kong

Mobile Apps