D

Sr Software Engineer, Data & AI Platform

icon building Company : Dolby
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Sr Software Engineer, Data & AI Platform

We're building a unified data and AI/ML model development and training platform—similar in ambition to those platforms at other media technology companies—to accelerate our entire AI/ML lifecycle from data preparation and feature computation to experimentation, large-scale distributed training, evaluation, deployment, and governance & observability. As a Senior Software Engineer, you will design and implement core platform services exposed through high‑quality APIs and SDKs for a platform that seamlessly integrates data systems, compute orchestration, and AI/ML tooling. A major focus is creating horizontal, reusable components—feature pipelines, embedding services, training/evaluation frameworks, SDKs, and model management APIs—to simplify the end‑to‑end AI/ML lifecycle. You'll partner closely with AI/ML researchers/engineers, data engineers, and product teams to deliver a paved-path developer experience: from a laptop run to multi-node, GPU-accelerated training; experiment tracking & lineage; model packaging with evaluation hooks; and continuous delivery to production. Design and build platform primitives—Python SDKs, platform APIs, and templates—that enable reproducible experiments, configuration-as-code workflows, model lineage, and artifact tracking, which enable seamless promotion from research to production. Create developer tools to elevate development experience—CLIs, UI, dashboards, visualization layers—that simplify platform operation and multi-stage workflows. Implement and scale distributed training systems (multi-node GPU workloads) on top of Kubernetes and cloud-based orchestration foundation. Build large-scale evaluation frameworks for offline tests, shadow deployments, and A/B experimentation. Implement model/dataset versioning, approvals, lineage tracking, retention, and compliance hooks. Partner with AI/ML research, platform engineering/MLOps and infrastructure, and data engineering teams to generalize workflows into reusable frameworks. Partner with platform engineering/MLOps and infrastructure to define observability stacks for metrics, drift indicators, performance regressions, training/inference health signals, production reliability (SLIs/SLOs), monitoring, and incident response. Desired Background: BS in Computer Science, Mathematics, Engineering, or equivalent technical field. Master's preferred. Proven track record building large-scale distributed systems and integrated data and AI/ML platforms (e.g., training, serving, workflow orchestration, data pipelines). Expert-level proficiency in Python and one of Go/Java/C++ and building production-grade services/APIs/SDKs Extensive hands-on experience with Kubernetes (EKS, GKE, self-hosted, etc) including autoscaling and job scheduling frameworks, GPU infrastructure, and AI/ML-related AWS/GCP managed services (VertexAI, SageMaker, etc). Deep expertise with AI/ML ecosystem and tooling such as PyTorch, TensorFlow, Ray, experiment/feature/model stores (MLFlow, WnB, Feast, etc), Hugging Face Proven ability to scale AI/ML workloads and pipelines—pipeline SDKs, feature/model CI/CD, automated evaluation, safe rollouts, monitoring Strong developer-experience mindset: ability to translate research/engineering friction into elegant APIs, templates, and tools that reduce time-to-first-successful remote run and raise platform adoption. Previous experience with Databricks. Knowledge of multimodal AI/ML (audio, video, text) data preparation, feature extraction, model development, training, and evaluation workflows. Experience with LLM/foundation model sizing/estimation, training requirements, pipelines, and deployment. Knowledge of LLM/foundation model sizing/estimation, training requirements, evaluation workflows and orchestration and deployment patterns. Experience designing feature stores or embedding services tightly integrated with training pipelines.
Original job Sr Software Engineer, Data & AI Platform posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Software Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Software Engineer Jobs in India

GrabJobs is the no1 job portal in India, connecting you to thousands of jobs fast! Find the best jobs in India, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.