Logo-of-NEWBRIDGE-ALLIANCE-PTE.-LTD.-hiring-for-jobs-in-Singapore-on-GrabJobs

Machine Learning (Ops) Engineer

salary Salary :

$10,000 - 14,000 monthly

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Machine Learning (Ops) Engineer

Our clients ML Platform team enables 100+ ML scientists and engineers to train, deploy, and monitor models that serve 10M+ QPS across recommendation, search, ads, and GenAI products. Our platform powers e-commerce and content experiences similar to TikTok Shop, with a focus on reliability, speed, and developer velocity.

They treat ML infrastructure as a product and operate at the scale of major social-commerce platforms.

The Role

We are hiring an MLOps Engineer to build and scale the core ML platform used by all ML teams. You will own systems for training, serving, experimentation, and monitoring. Your work directly impacts how fast they can ship new models to production and how reliably they serve millions of users.

What You’ll Do

  • Model Serving: Build and operate low-latency, high-throughput online inference services for deep learning and LLM models. Optimize with vLLM, Triton, TensorRT, GPU scheduling, and autoscaling
  • Training Infrastructure: Scale distributed training on GPU clusters using Kubernetes, Ray, DeepSpeed, or Megatron. Improve job scheduling, checkpointing, and resource utilization
  • ML Platform Products: Develop internal tools for the full ML lifecycle: feature store, model registry, experiment tracking, workflow orchestration, and CI/CD for ML
  • GenAI Infra: Build infrastructure for LLM fine-tuning, RAG evaluation, vector database management, and cost/latency monitoring for GenAI workloads
  • Data & Feature Platform: Maintain real-time and batch feature pipelines. Ensure data quality, lineage, and SLAs for Spark, Flink, and Kafka jobs
  • Observability: Implement monitoring, alerting, and debugging tools for model performance, data drift, training failures, and online serving
  • Developer Experience: Reduce friction for ML teams. Provide SDKs, CLI tools, and documentation. Run internal office hours and gather requirements
  • Reliability: Own SLOs for critical ML services. Lead incident response and postmortems. Drive capacity planning and cost optimization

Minimum Qualifications

  • Education: BS/MS in Computer Science, Engineering, or related field
  • Experience: Software engineering, DevOps, or ML engineering, with 3+ years building ML infrastructure or platform services
  • Programming: Strong proficiency in Python, Go, or Java. Solid understanding of software design, testing, and distributed systems
  • Cloud & Containers: Production experience with Kubernetes, Docker, and AWS/GCP/Azure. Familiar with Terraform or infrastructure-as-code
  • ML Systems: Understanding of ML workflows. Experience with at least one: model serving, distributed training, feature stores, or workflow orchestrators like Airflow/Kubeflow
  • Data Systems: Experience with Spark, Kafka, or similar large-scale data tools
  • Problem Solving: Ability to debug complex systems across ML, data, and infra layers

Preferred Qualifications

  • Built ML platforms supporting 50+ ML engineers or 100+ models in production
  • Deep expertise in GPU inference optimization: batching, quantization, CUDA, vLLM, Triton Inference Server
  • Experience with LLM infra: fine-tuning pipelines, vector DBs like Milvus/Weaviate, prompt/version management
  • Knowledge of ML frameworks internals: PyTorch, TensorFlow, JAX
  • Experience with Ray, Kubeflow, MLflow, Feast, or Tecton
  • Background in high-QPS online services, SRE, or performance engineering
  • Contributions to open-source ML infra projects
Original job Machine Learning (Ops) Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Share Job
Share Job

About the Company

NEWBRIDGE ALLIANCE PTE. LTD.

Newbridge is a global management consulting and executive search firm. We are the largest is the largest privately-held executive search firm and the third-largest executive search and talent strategy firm in Asia. The firm offers services in Executive Search, Board Consulting, and Leadership Strate...

Read more about the company

Auto-Apply to Similar Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI
💰

Technology Salaries

Similar Jobs in Singapore

GrabJobs is the no1 job portal in Singapore, connecting you to thousands of jobs fast! Find the best jobs in Singapore, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.