Number of Applicants
:000+
4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance.
Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey.
Designing for and implementing observability (ELK, Thanos, Grafana, Alert Manager / Pager Duty, Dynatrace).
Defining, testing, and running an incident management process. Capacity planning. Change and release management, including CI/CD. Toil management.
Primary Skill: Site Reliability Engineering
Secondary Skills: Elastic Stack (ELK), Service Level Monitoring, Dynatrace Administration, Error Budgets, Kubernetes, Telemetry, Terraform
Share this job with your friends
Copyright © 2024 Grabjobs Pte.Ltd. All Rights Reserved.