Logo-of-Morgan-Stanley-hiring-for-jobs-in-Hong-Kong-on-GrabJobs

Site Reliability Engineer (Infrastructure Applications) - Director P3 - ETS

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Site Reliability Engineer (Infrastructure Applications) - Director P3 - ETS

About Morgan Stanley

Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, investment management and wealth management services. The Firm's employees serve clients worldwide including corporations, governments, and individuals from more than 1,200 offices in 43 countries.

As a market leader, the talent and passion of our people is critical to our success. Together, we share a common set of values rooted in integrity, excellence, and strong team ethic. Morgan Stanley can provide a superior foundation for building a professional career - a place for people to learn, to achieve and grow. A philosophy that balances personal lifestyles, perspectives and needs is an important part of our culture.

Overview
Join Morgan Stanley's Application Services Infrastructure team to keep a set of business-critical infrastructure applications reliable for technologists across the firm. Our platforms help teams schedule, coordinate, and monitor their production workloads.

You'll combine deep Linux troubleshooting with automation and reliability engineering: improving monitoring, reducing toil, leading upgrades, and driving root-cause fixes that prevent repeat incidents.

What you'll do
- Own production reliability for multiple infrastructure applications: incident response, triage, and sustained follow-through to resolution.
- Drive stability work: improve alerting quality, monitoring coverage, and operational tooling to reduce noise and speed recovery.
- Lead or execute production changes (upgrades, hygiene fixes, reconfiguration) with strong change-management and rollback planning.
- Perform in-depth RCAs and prevent recurrence of incidents and escalations through long-term fixes, automation, and better runbooks
- Build self-service workflows and high-quality documentation to improve user experience and reduce time-to-production.
- Partner with product engineers and infrastructure teams to identify systemic issues and deliver cross-team solutions.

On-call & schedule
- After onboarding, you'll join a rotating on-call roster with periodic weekend coverage (~1 weekend/month).
- L3 support focuses on high-impact incidents where documentation is incomplete-success requires calm, structured troubleshooting in distributed systems.
- Occasional off-hours work may be needed for planned changes and incident follow-up (we aim to minimize this through automation and process).

Required experience
- At least 7 years of experience in production support / reliability experience for applications on Linux/UNIX.
- Strong command-line troubleshooting skills: logs, processes, networking, and dependency health in distributed systems.
- Ability to write production-ready automation in bash/shell plus one language (Python preferred; Go/Ruby/Perl/C/others welcome).
- Strong written communication for technical documentation and incident/RCA write-ups.
- Working understanding of distributed architecture (load balancers, app servers, databases, messaging).
- AI-assisted development and operational automation.

Preferred experience
- Cloud-native deployment/support and/or containers (Docker/podman).
- Observability tooling (Grafana, Splunk, or similar), log forwarding/agents, and alert tuning.
- Linux administration and performance troubleshooting.
- Any database experience (SQL/NoSQL).
- Experience with workflow/scheduling platforms (Autosys, Apache Airflow) or coordination systems (Apache Zookeeper).

WHAT YOU CAN EXPECT FROM MORGAN STANLEY:

At Morgan Stanley, we raise, manage and allocate capital for our clients - helping them reach their goals. We do it in a way that's differentiated - and we've done that for 90 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren't just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you'll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There's also ample opportunity to move about the business for those who show passion and grit in their work.

To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices into your browser.

Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.
Original job Site Reliability Engineer (Infrastructure Applications) - Director P3 - ETS posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Site Reliability Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Site Reliability Engineer Jobs in Hong Kong

GrabJobs is the no1 job portal in Hong Kong, connecting you to thousands of jobs fast! Find the best jobs in Hong Kong, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.