Responsibilities
- Design and enhance Kubernetes provider platforms and supporting infrastructure to improve scalability, reliability, and developer experience.
- Automate and simplify Kubernetes clusters lifecycle management, upgrades, and observability workflows.
- Implement monitoring and alerting systems using tools such as Prometheus, Grafana, or Elastic Observability to meet service-level objective (SLOs).
- Collaborate with security teams to integrate and enforce security controls and compliance requirements within the container platform.
- Work with application teams to improve platform usability, streamline onboarding, and reduce operational toil.
- Respond to incidents and perform post-incident reviews, driving continuous improvement and operational excellence.
- Contribute to the reliability engineering culture, fostering shared responsibility for system availability and performance.
Requirements (Minimum Qualifications)
- Background in Computer Engineering, Computer Science or related field
- Strong programming or scripting experience (e.g. Git, Terraform, Javascript, Python, or Bash)
- Good understanding of Linux systems, containers, and networking fundamentals.
- At least 1-3 years of hands-on experience operating or managing Kubernetes clusters in production environments.
- Familiarity with CI/CD pipelines, infrastructure-as-code, and configuration management (e.g. Terraform, Ansible, Helm).
- Experience implementing observability and monitoring in large-scale systems.
Good-to-haves
- Knowledge of Kubernetes security concepts such as RBAC, admission controllers, and policy enforcement.
- Experience with GitOps workflow and deployment tools (e.g. ArgoCD, Gitlab Runner)
- Understanding of service mesh technologies (e.g. Istio)
- Exposure to reliability engineering practices, including SLOs, error budgets, and capacity planning
- Familiarity with cloud platforms (AWS, GCP or Azure) and hybrid infrastructure architectures
- Knowledge of networking protocols (HTTP, TCP, DNS) and troubleshooting tools
- Passion for open-source technologies
Why join us?
- At CSIT, you will:
- Build and operate infrastructure that supports Singapore's national security missions.
- Work with talented engineers who take pride in operational excellence, collaboration, and innovation.
- Be empowered to experiment, improve, and scale modern technologies securely.
- Have opportunities to deepen your expertise in Kubernetes, SRE practices, and secure platform engineering at scale.