Lead Service Reliability Engineer

Company : Amadeus

Job Type : Full Time

Taguig, Metro Manila

Job Description - Lead Service Reliability Engineer

Job Title

Lead Service Reliability Engineer

Purpose of the role

The Lead Site Reliability Engineering for Stratos will be responsible for ensuring the reliability, performance and scalability of our mission-critical platforms. In this role, you will be safeguarding operational excellence in the products under Stratos, influence reliability strategies, integral in production incident response, and help improve operational metrics.

The role requires a deep and/or broad expertise in our environment architecture to drive efficiency improvements. It involves recommending solutions and best practices, shaping departmental strategy, and converting strategic objectives into actionable plans for the area. Additionally, the role includes setting clear targets for the team and monitoring progress to ensure alignment with goals. Collaboration is key, as you will work closely with teams such as Development and Amadeus Production Support to make configuration changes or design and develop code that meets target SLOs. You will identify opportunities to optimize costs while maintaining stability, which may include leading toil-reduction initiatives, managing capacity planning and tuning, updating SOPs, and developing code for performance improvements. This is a hybrid role requiring on-site presence 2–3 days per week.

In this role you'll:

- Define and track Service Level Indicators (SLIs), Objectives (SLOs), and Error Budgets in partnership with engineering and product leads

- Collaborate with Operations and Development teams to drive service reliability, availability, and scalability
- Influence architecture and deployment standards to align with SRE principles

- Drive and participate in toil reduction projects to minimize if not eliminate recurring manual activities performed by the team
- Champion observability, automation, and infrastructure-as-code practices to reduce manual intervention and improve system health

- Establish feedback loop with development teams for them to have visibility on the how stable and reliable their services are in client environments

- Drive production incident response and lead root cause analysis and continuous improvement

- Design/Develop operational improvement items with development teams working with them closely in prioritizing these improvements

- Provide input on process improvements to Change, Release, and Incident Management
- Create and implement support playbooks that resources can use as part of emergency response to production issues

About the ideal candidate

- Knowledgeable and experienced in utilizing different Azure resources such as Storage, Network, Functions, Logic Apps. App Services and AKS

- Strong technical expertise on Azure DevOps, developing in git and working on gitops repo and build/release pipelines
- Have hands-on experience in developing Azure Powershell scripts, Azure Runbooks, or any other infrastructure automation tools
- Knowledgeable in cloud platform and AI technologies
- Experienced with monitoring and logging tools (Grafana, Dynatrace, Splunk)
- Proven ability to adapt to emerging cloud technologies and industry leading DevOps applications such as Terraform, Docker Containers, and Kubernetes

- Knowledgeable in cloud implementation of Navitaire products across different cloud infrastructure models
- Understands production environments and processes and ways on how they can be further optimized through various Azure features and other cloud technologies/services
- Proven ability to drive problem solving efforts through effective issue analysis
- Has the ability to lead efforts to implement infrastructure changes to increase environment stability and support scalability
- Has the ability to drive collaborations with different Navitaire teams in enforcing environment standards and policies
- Effectively works in a team environment and contributes in building capabilities of team members
- Proficient in C#
- Proven ability to work in a dynamic, fast-paced and multi-cultural environment
- Willing to work on shifting schedules and hybrid set-up.

Diversity & Inclusion

Amadeus aspires to be a leader in Diversity and Inclusion in the tech industry, enabling every employee to reach their full potential by fostering a culture of belonging and fair treatment, attracting the best talent from all backgrounds, and as a role model for an inclusive employee experience. 

Amadeus is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to gender, race, ethnicity, sexual orientation, age, beliefs, disability or any other characteristics protected by law. 

Original job Lead Service Reliability Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Share Job

Get your Resume Reviewed for Free

Similar Service Reliability Engineer Jobs in the Philippines

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Lead Service Reliability Engineer

Job Description - Lead Service Reliability Engineer

Similar Service Reliability Engineer Jobs in the Philippines

Mobile Apps