Penerangan Pekerjaan - Site Reliability Engineer (4024)
Description
Enabling safe and rewarding digital lives for genuine people, everywhere
We make it our mission to ensure more genuine people have digital access to opportunities, and businesses have access to more genuine people. Our technology draws on diverse and reliable data to create a single point of truth for identity and address verification.
With over 30 years of experience behind us our team and technology are focused on enabling safe and rewarding digital lives for everyone. Regardless of age, location or background, genuine people everywhere should be able to digitally prove who they are and where they live.
About the team and role
Global Fraud Solutions
The team provides decision support solutions to address business objectives in risk prevention and fraud detection. We deliver software solutions and offer client support using our expertise and a client-focused approach.
Site Reliability Engineer
The SRE will build and operate the reliability, observability, and operational excellence infrastructure underpinning the GFS managed fraud detection platforms. You will work across deployment pipelines, cloud infrastructure, monitoring, and incident management — ensuring GBG can deliver on high availability SLAs for banking and fintech customers who depend on real-time fraud detection at scale.
What you will do
Design and operate the SRE practice for Managed oferings, including on-call processes, SLA frameworks, incident response playbooks, and post-incident review (PIR) processes.
Build and maintain observability infrastructure: centralised logging (correlation IDs), metrics dashboards, distributed tracing, and alerting for the Predator/Instinct platform stack.
Define and track SLOs (Service Level Objectives) and error budgets for real-time transaction processing pipelines, targeting high TPS and low round-trip latency.
Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform, Helm), supporting both AWS/Azure cloud deployments and on-premises customer environments.
Implement and maintain CI/CD pipelines for GFS solutions (Jenkins, etc.)
Work with Engineering teams to ensure security and compliance readiness for Managed services — including PCI DSS, ISO 27001, SOC 1/2/3, PDPA/GDPR — in close coordination with InfoSec teams.
Drive platform resilience improvements: high availability, auto-scaling, disaster recovery, backup/restore procedures, and chaos engineering practices.
Manage secrets, certificate rotation, identity/access controls (OAuth/RBAC), and vulnerability management for the hosted environment.
Support performance testing methodology and baseline establishment for our products.
Contribute to the Architecture Review Committee (ARC) with SRE and operational perspectives on technology choices.
Collaborate with engineering squads to embed reliability and DevSecOps practices across the SDLC.
Skills we’re looking for
Minimum 5 years of solid hands-on experience in a Site Reliability, Platform Engineering, or DevOps role, ideally supporting mission-critical real-time processing systems in banking, payments, or fintech.
Strong proficiency with cloud platforms (AWS preferred; Azure/GCP acceptable) including networking, compute, storage, and managed services.
Deep expertise with containerisation and orchestration: Docker, Kubernetes (EKS/AKS/GKE), Helm, and associated tooling.
Infrastructure as Code experience: Terraform (required), and familiarity with Ansible or Pulumi.
CI/CD pipeline design and management: GitHub Actions, Jenkins, ArgoCD, or equivalent.
Experience with security and compliance frameworks applicable to hosted financial services: PCI DSS, ISO 27001, SOC 1/2/3, GDPR/PDPA.
Familiarity with database reliability practices for SQL Server, PostgreSQL, and Oracle — including replication, read replicas, and failover.
Working knowledge of secrets management (HashiCorp Vault, AWS Secrets Manager) and zero-trust identity principles.
Experience supporting real-time streaming or event-driven architectures (Kafka, RisingWave, or similar) in production environments.
Scripting and automation proficiency: Python, Bash, or Go for operational tooling.
Strong sense of operational ownership and accountability — comfortable being on-call and driving incidents to resolution.
Excellent communication skills — able to produce clear incident reports, runbooks, and architecture documentation for both technical and executive audiences.
Proactive mindset: identifies reliability risks before they become incidents and champions a culture of blameless post-mortems.
Collaborative and effective working with software engineers, product managers, and InfoSec teams.
Continuous improvement orientation — always looking to reduce toil, automate repetitive tasks, and improve platform resilience.
Flexible and adaptable — able to support a globally distributed product with customers across multiple time zones.
To find out more
As an equal opportunity employer, we are dedicated to creating a diverse and inclusive workplace where everyone feels valued and empowered. Please inform your GBG Talent Attraction Partner if you require any reasonable adjustments to the interview process.
To chat to the Talent Attraction team and find out more about our benefits and why we’re a great place to work, drop an email to [email protected] and we’ll be in touch. You can also find out more about careers at GBG and check out our current opportunities at gbgplc.com/careers.
Semua Iklan Pekerjaan adalah tertakluk kepada Terms of Service GrabJobs. Kami membenarkan pengguna membenderakan siaran yang mungkin melanggar syarat tersebut. Iklan Pekerjaan juga mungkin dibenderakan oleh pasukan penyederhana GrabJobs. Walau bagaimanapun, tiada sistem penyederhanaan yang sempurna dan membenderakan siaran tidak memastikan bahawa ia akan dialih keluar.
Jadilah orang yang pertama menerima Others Full-Time Jobs terkini di Malaysia.
Sediakan makluman pekerjaan:
Dengan mengaktifkan makluman kerja, saya bersetuju menerima GrabJobs Terms & Privacy Policy. Saya boleh berhenti melanggan makluman kerja pada bila-bila masa.
Langkau
Anda mencapai bilangan maksimum makluman kerja anda.
GrabJobs ialah portal pekerjaan no1 di Malaysia, menghubungkan anda dengan beribu-ribu pekerjaan dengan pantas!
Cari kerja terbaik di Malaysia, mohon dalam 1 klik dan dapatkan pekerjaan hari ini!