$6,000 - 10,000 monthly
Number of Applicants
:000+
Let AI Supercharge Your Job Hunt!
JobCopilot scans 500,000+ company career sites daily to find jobs for you
Builds and maintains the infrastructure and tooling that keepsmachine learning systems reliable in production — from designing CI/CDpipelines and model deployment workflows to monitoring performance and managingmodel lifecycle at scale. The role works closely with ML, backend, and platformteams, contributes to automation frameworks and observability standards, andhelps ensure AI models move seamlessly from experimentation to production.Requires 5+ years of MLOps or DevOps engineering experience, with a trackrecord of operating robust ML infrastructure in production-grade environments.
Makethe path from model experiment to reliable production as fast, automated, andobservable as possible.
· Bachelor’s degree or higher inComputer Science, Engineering, or a related field — or equivalent practicalexperience.
· Design and maintain CI/CDpipelines for ML models and services.
· Build and operate modeldeployment, serving, and rollback workflows.
· Implement monitoring,observability, and alerting for models in production.
· Manage the model lifecycle withMLflow: experiment tracking, versioning, registries, and reproducibility.
· Automate infrastructure andpartner with ML and platform teams on standards.
· 5+ years of MLOps or DevOpsengineering experience.
· Strong with containers andorchestration (Docker, Kubernetes).
· Experience with CI/CD (GitLabCI) and infrastructure-as-code (Terraform or similar).
· Hands-on with a major cloudplatform (Azure, AWS, or GCP).
· Hands-on experience with MLflowfor experiment tracking and model registry.
· Familiarity with ML frameworks(PyTorch), workflow orchestration (Kubeflow or similar), and monitoring stacks.
· Experience serving LLMs orlarge models in production.
· Knowledge of feature stores anddata pipeline tooling.
· Cost and latency optimisationfor model serving.
The platform depends on AI models moving reliably fromexperimentation into governed, production use across multiple products. Withoutstrong MLOps ownership, models stall at prototype stage or degrade silentlyonce deployed. This role keeps model performance, operational reliability, andcontinuous improvement aligned across the project — ensuring shared modelinfrastructure scales consistently rather than fragmenting team by team.
Auto-Apply to Similar Jobs with your AI JobCopilot
Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.