A

Site Reliability Engineering (SRE)

icon building Empresa : Ant Group
icon briefcase Tipo de empleo : Tiempo completo

Número de solicitantes

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Solicite ya
icon loader Solicite ya

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Descripción del trabajo - Site Reliability Engineering (SRE)


Description



Key Responsibilities
• Ensuring Payment System Stability and High Availability: Lead technical initiatives to strengthen the reliability of our payment systems. This includes designing
and implementing monitoring tools, logging frameworks, dashboards, diagnostic utilities, and disaster recovery plans. Conduct routine drills, develop contingency
strategies, and participate in on-call rotations to ensure rapid response and resolution of production issues across regions.
• Incident Handling and Emergency Response: Conduct routine drills, develop contingency strategies, and participate in on-call rotations to ensure rapid response
and resolution of production issues.
• Analyze and Optimize Production Issues: Investigate and analyze real-world production cases, such as performance bottlenecks or system inefficiencies, to derive
actionable insights and establish technical best practices. Contribute to the evolution of a highly available and resilient payment architecture.
• Design and Implement Infrastructure Solutions: Architect and set up new Internet Data Centers (IDCs) to meet scalability and performance requirements. Develop
and execute comprehensive data protection plans that adhere to industry standards and compliance requirements, ensuring data integrity and security.
 
Technical Requirements
• Solid knowledge of Computer Science, and familiar with the principles of Operating System (Unix/Linux), Computer Storage, Computer Networking and other
related principles.
• Proficient in at least one programming language, such as Java/Python/Shell with experience in developing operations and maintenance tools.
• The strong ability to resolve system problems, good communication skills and a sense of ownership.
• Experiences in operating Google Cloud Platform (GCP) / Oracle Cloud Infrastructure(OCI), OLAP platform (like DPDI, Flink, AntSpark), OcenBase (OB), Ant Trust-Native Service (ATS)  is a plus.


Original job Site Reliability Engineering (SRE) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Solicite ya
Share Job
Share Job

Auto-Apply to Site Reliability Engineering (SRE) Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Site Reliability Engineering (SRE) Jobs in Spain

GrabJobs es el portal de empleo nº 1 en Spain, que te conecta con miles de empleos clave ¡rápidamente! Encuentra los mejores trabajos de en Spain, ¡solicita en 1 clic y consigue un trabajo hoy mismo!

Aplicaciones móviles

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.