A

Lead Service Reliability Engineer

icon building Company : Amadeus
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
icon loader

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Lead Service Reliability Engineer

Job Title

Lead Service Reliability Engineer

Purpose of the role

The Lead Site Reliability Engineering for Stratos will be responsible for ensuring the reliability, performance and scalability of our mission-critical platforms. In this role, you will be safeguarding operational excellence in the products under Stratos, influence reliability strategies, integral in production incident response, and help improve operational metrics. 

The role requires a deep and/or broad expertise in our environment architecture to drive efficiency improvements. It involves recommending solutions and best practices, shaping departmental strategy, and converting strategic objectives into actionable plans for the area. Additionally, the role includes setting clear targets for the team and monitoring progress to ensure alignment with goals. Collaboration is key, as you will work closely with teams such as Development and Amadeus Production Support to make configuration changes or design and develop code that meets target SLOs. You will identify opportunities to optimize costs while maintaining stability, which may include leading toil-reduction initiatives, managing capacity planning and tuning, updating SOPs, and developing code for performance improvements. This is a hybrid role requiring on-site presence 2–3 days per week.

In this role you'll:

- Define and track Service Level Indicators (SLIs), Objectives (SLOs), and Error Budgets in partnership with engineering and product leads

- Collaborate with Operations and Development teams to drive service reliability, availability, and scalability
- Influence architecture and deployment standards to align with SRE principles

- Drive and participate in toil reduction projects to minimize if not eliminate recurring manual activities performed by the team
- Champion observability, automation, and infrastructure-as-code practices to reduce manual intervention and improve system health

- Establish feedback loop with development teams for them to have visibility on the how stable and reliable their services are in client environments

- Drive production incident response and lead root cause analysis and continuous improvement

- Design/Develop operational improvement items with development teams working with them closely in prioritizing these improvements

- Provide input on process improvements to Change, Release, and Incident Management
- Create and implement support playbooks that resources can use as part of emergency response to production issues

About the ideal candidate

- Knowledgeable and experienced in utilizing different Azure resources such as Storage, Network, Functions, Logic Apps. App Services and AKS

- Strong technical expertise on Azure DevOps, developing in git and working on gitops repo and build/release pipelines
- Have hands-on experience in developing Azure Powershell scripts, Azure Runbooks, or any other infrastructure automation tools
- Knowledgeable in cloud platform and AI technologies
- Experienced with monitoring and logging tools (Grafana, Dynatrace, Splunk)
- Proven ability to adapt to emerging cloud technologies and industry leading DevOps applications such as Terraform, Docker Containers, and Kubernetes

- Knowledgeable in cloud implementation of Navitaire products across different cloud infrastructure models
- Understands production environments and processes and ways on how they can be further optimized through various Azure features and other cloud technologies/services
- Proven ability to drive problem solving efforts through effective issue analysis
- Has the ability to lead efforts to implement infrastructure changes to increase environment stability and support scalability
- Has the ability to drive collaborations with different Navitaire teams in enforcing environment standards and policies
- Effectively works in a team environment and contributes in building capabilities of team members
- Proficient in C#
- Proven ability to work in a dynamic, fast-paced and multi-cultural environment
- Willing to work on shifting schedules and hybrid set-up.

Diversity & Inclusion

Amadeus aspires to be a leader in Diversity and Inclusion in the tech industry, enabling every employee to reach their full potential by fostering a culture of belonging and fair treatment, attracting the best talent from all backgrounds, and as a role model for an inclusive employee experience.  

Amadeus is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to gender, race, ethnicity, sexual orientation, age, beliefs, disability or any other characteristics protected by law.  

Original job Lead Service Reliability Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Share Job
Share Job

Auto-Apply to Service Reliability Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Service Reliability Engineer Jobs in the Philippines

GrabJobs is the no1 job portal in the Philippines, connecting you to thousands of jobs fast! Find the best jobs in the Philippines, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.