Logo-of-Centre-For-Strategic-Infocomm-Technologies-hiring-for-jobs-in-Singapore-on-GrabJobs

System Reliability Engineer (Data Centre)

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - System Reliability Engineer (Data Centre)

You will be part of a dynamic team responsible for ensuring the reliability, availability, and performance of our data centre's IT operations. As a System Reliability Engineer (Data Centre), you will oversee the day-to-day IT operations within the data centre, working closely with various teams to ensure seamless IT service delivery. While knowledge of data centre power and cooling infrastructure is beneficial, the primary focus of this role is on IT operations. You will collaborate with Data Centre Facilities teams on matters related to power, cooling, and physical infrastructure as needed. You must have a good understanding of cloud infrastructure technologies, architecture, and site reliability engineering (SRE) principles. 

Responsibilities


  • Oversee and manage IT operations within the data centre, including day-to-day monitoring, incident management, and problem management

  • Lead the end-to-end incident management lifecycle that encompass immediate troubleshooting, root cause identification, and resolution implementation to restore services, followed by comprehensive post-incident analysis

  • Develop and maintain documentation on IT infrastructure, operations, and procedures within the data centre

  • Perform capacity planning to ensure IT infrastructure is scalable for future demands

  • Collaborate and coordinate with Data Centre Facilities teams on matters related to power, cooling, and physical infrastructure

  • Design and implement robust observability platform alongside network monitoring tools for performance monitoring and real-time alerting of IT devices and networks

  • Implement and manage remote management tools for out-of-band access and control of IT devices and servers

  • Define, implement, and track SRE metrics, including SLO, SLI, and error budgets to improve data centre IT reliability

Requirements (Minimum Qualifications)


  • Background in Computer Science, Computer or Electrical Engineering, Information Technology or a related field

  • Good technical knowledge in IT infrastructure, including servers, storage, networking, and cloud technologies

  • Proficient in IT management software and tools

  • 2 years of working experience in IT operations is preferred

  • Fresh graduates are welcomed to apply


 
As CSIT is an agency under the Ministry of Defence (Singapore), only Singapore Citizens will be considered.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans.
Original job System Reliability Engineer (Data Centre) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to System Reliability Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar System Reliability Engineer Jobs in Singapore

GrabJobs is the no1 job portal in Singapore, connecting you to thousands of jobs fast! Find the best jobs in Singapore, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.