Job Description - Site Reliability Engineer (Kubernetes, Terraform, Multi-Cloud/China Cloud)
Contract Type:
Permanent
Location:
Singapore, Singapore
Date Published:
28-Apr-2026
Salary:
$156,000.00 - $156,000.00 Annual
Company introduction A fast-growing cloud and observability platform is looking to expand its Site Reliability Engineering team in Singapore. The organisation operates a highly distributed, cloud-native platform across multiple regions globally, supporting mission-critical environments with a strong focus on uptime, automation, and scalability.
You will be joining a lean regional team working closely with a larger engineering function based in China, supporting a modern infrastructure stack built on Kubernetes and multi-cloud architecture.
Job responsibilities
Reporting to the Chief Technology Officer, your role involves:
Owning the reliability, availability, and performance of a globally distributed cloud platform
Designing, building, and maintaining Kubernetes-based infrastructure across multiple cloud environments
Managing infrastructure through code using Terraform, ensuring scalability and consistency
Supporting and optimising multi-cloud environments (AWS and other cloud providers) across regions
Monitoring system health, troubleshoot incidents, and lead production issue resolution
Participating in on-call rotation and support high-availability environments
Working closely with engineering teams across regions to improve system resilience and automation
Performing deep troubleshooting across infrastructure, application, and database layers (including SQL where required)
Job requirements
As a successful candidate, you will have:
Strong hands-on experience with Kubernetes in production environments
Proven experience using Terraform for infrastructure as code in complex environments
Exposure to multi-cloud environments (Mostly AWS; experience with Alibaba Cloud, Tencent Cloud or Huawei Cloud is highly advantageous)
Experience with Cloudflare is also highly advantageous
Solid understanding of cloud infrastructure, networking, and distributed systems
Experience in handling production incidents and working in high-availability environments
Proficiency in scripting (Python or Go preferred)
Ability to troubleshoot across systems, including database-level debugging using SQL
Strong communication skills in both English and Chinese, to work with regional teams
Comfortable with shift work or on-call rotation when required
Why you should join them You will get the opportunity to join a highly technical environment where SRE plays a critical role in keeping a global, multi-cloud platform reliable and scalable. This is a high-ownership role with direct impact on uptime, performance, and platform resilience, while giving you exposure to modern cloud-native technologies, Kubernetes, Terraform and distributed infrastructure across multiple regions. You will also work closely with teams across Singapore and China, making it a strong fit for engineers who enjoy solving complex infrastructure challenges in a fast-moving, international setup.
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in Singapore.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in Singapore, connecting you to thousands of jobs fast!
Find the best jobs in Singapore, apply in 1 click and get a job today!