Logo-of-KUOK-(SINGAPORE)-LIMITED-hiring-for-jobs-in-Singapore-on-GrabJobs

AI Platform Engineer

salary Salary :

$8,300 - 10,300 monthly

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - AI Platform Engineer

About the Role

We are seeking a passionate AI Platform Engineer to build and own the infrastructure layer that every AI use case in Kuok Group runs on —the LLM gateway, the deployment platform, CI/CD pipelines, model serving, observability, cost controls, and the eval pipeline infrastructure, end to end. This role will be reporting to the Principal AI Architect.

This is a T-shaped role: broad cloud and DevOps foundations, with deep specialism in LLM infrastructure. The ideal candidate is equally comfortable provisioning environments and managing release pipelines as they are configuring a model gateway, wiring up LangSmith traces, and buildingan eval harness.

Working closely with the Head, AI Platform on architecture direction and with the LLM Ops / MLOps Engineer on the observability and eval layer, this person will be the backbone of the platform that Applied AI Engineers depend on to ship confidently and at pace.

.

Key Responsibilities

Deployment Platform & CI/CD

  • Design, build, and maintain CI/CD pipelines for all AI use cases — from code commit through staging to production, with automated release gates and rollback capability
  • Own environment provisioning and infra-as-code (Terraform or equivalent) — staging, UAT, and production environments should be reproducible, version-controlled, and auditable
  • Manage the deployment platform end to end: release scheduling, environment promotion, incident response, and post-deployment validation
  • Champion good deployment hygiene: automated pipelines, version-controlled configuration, and documented environment differences as standard practice

LLM Gateway & Model Serving

  • Build and operate the LLM gateway layer (LiteLLM or equivalent) — API access controls, rate limiting, model routing, and failover across Azure-backed endpoints
  • Manage model serving configuration: endpoint management, load balancing, latency SLOs, and model switching without disrupting live use cases
  • Own secrets and access management for all model API credentials and service accounts across environments
  • Maintain a prompt and model version registry so that every production use case can be traced to a specific model version and prompt configuration

Observability, Cost & Controls

  • Instrument all deployed use cases with LLM observability tooling (LangSmith or equivalent)— traces, latency, token counts, and error rates as standard
  • Build and maintain cost telemetry dashboards: per-use-case token consumption, compute spend, and alerting on cost anomalies
  • Implement and maintain token budget controls and rate limits across BUs — keeping cost visible and predictable is a shared responsibility that starts at the platform layer
  • Own general platform monitoring and reliability: uptime, alerting, on-call runbooks, and incident response for platform-layer issues

Eval Pipeline Infrastructure

  • Build the infrastructure layer for LLM evaluation pipelines — test harnesses, regression runners, and LLM-as-judge scaffolding used by Applied AI Engineers per use case
  • Work with the LLM Ops / MLOps Engineer on eval pipeline design
  • Ensure eval pipeline runs are logged, versioned, and traceable — eval results should be reproducible
  • Support evals as a consistent deployment gate — working with the team to ensure every use case has a passing eval run on the current model version before moving to production

Standards & Collaboration

  • Maintain platform documentation — architecture diagrams, runbooks, environment specs, and onboarding guides — so institutional knowledge is shared and accessible across the team
  • Work within the Head, AI Platform's engineering standards: all platform changes go through code review before deployment
  • Support the QA / Dev Engineers (Applied AI cluster) on integration and regression testing where it touches the platform layer
  • Proactively surface platform-layer risks and capacity constraints to the Head, AI Platform

.

Requirements

Must-Have

  • Solid cloud and DevOps engineering foundations — you have built and operated CI/CD pipelines, managed environments with IaC, and handled production deployments and rollbacks on at least one major cloud platform (Azure, AWS, or GCP);comfortable working across Linux and Windows Server, and familiar with core networking concepts — VPC/VNET, DNS, firewalls, and load balancers
  • Hands-on experience with LLM infrastructure: you have configured and operated a model gateway or API proxy layer, managed multi-model routing, and dealt with rate limits and failover in a live environment
  • LLM observability experience — you have instrumented production AI systems with tracing and monitoring tooling and used the data to diagnose issues
  • Cost telemetry and token controls — you understand how LLM API costs are structured and have built or operated dashboards and controls to keep spend visible and bounded
  • Strong Python skills and comfort with the full LLM deployment tooling ecosystem —equally at home in application code and infrastructure configuration
  • Strong appreciation for documentation and configuration management — environments as code, clear runbooks, and written context that helps the team move faster together

.

Strong Advantage

  • Experience with eval pipeline infrastructure: test harness design, regression frameworks, LLM-as-judge scaffolding, or automated output quality checks
  • Security and access management experience in an AI context: IAM, RBAC, secrets management, API credential rotation, encryption at rest and in transit, and least-privilege access design for model-serving environments
  • Familiarity with MLOps practices: model versioning, A/B traffic splitting, canary deployments for model updates
  • Experience supporting engineering teams as a platform provider — you understand that your internal customers are the engineers shipping use cases, and you design for their velocity as well as for reliability
  • Exposure to enterprise multi-tenant environments: managing shared infrastructure across multiple teams or business units with different access and cost boundaries; familiarity with virtualisation platforms (VMware, Hyper-V, or Nutanix) is a plus
Original job AI Platform Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Share Job
Share Job

About the Company

KUOK (SINGAPORE) LIMITED

 KUOK (SINGAPORE) LIMITED  The Kuok Group began business as Kuok Brothers Limited in 1949 in Johor Bahru, Malaysia, trading rice, sugar and wheat flour. In 1953, Kuok (Singapore) Limited ("KSL"), was established as business activities expanded.   Through its subsidiaries and associate...

Read more about the company

Auto-Apply to Similar Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI
💰

Technology Salaries

Similar Jobs in Singapore

GrabJobs is the no1 job portal in Singapore, connecting you to thousands of jobs fast! Find the best jobs in Singapore, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.