Job Description - Sr Software Engineer, Data & AI Platform
We're building a unified data and AI/ML model development and training platform—similar in ambition to those platforms at other media technology companies—to accelerate our entire AI/ML lifecycle from data preparation and feature computation to experimentation, large-scale distributed training, evaluation, deployment, and governance & observability. As a Senior Software Engineer, you will design and implement core platform services exposed through high‑quality APIs and SDKs for a platform that seamlessly integrates data systems, compute orchestration, and AI/ML tooling. A major focus is creating horizontal, reusable components—feature pipelines, embedding services, training/evaluation frameworks, SDKs, and model management APIs—to simplify the end‑to‑end AI/ML lifecycle. You'll partner closely with AI/ML researchers/engineers, data engineers, and product teams to deliver a paved-path developer experience: from a laptop run to multi-node, GPU-accelerated training; experiment tracking & lineage; model packaging with evaluation hooks; and continuous delivery to production. Design and build platform primitives—Python SDKs, platform APIs, and templates—that enable reproducible experiments, configuration-as-code workflows, model lineage, and artifact tracking, which enable seamless promotion from research to production. Create developer tools to elevate development experience—CLIs, UI, dashboards, visualization layers—that simplify platform operation and multi-stage workflows. Implement and scale distributed training systems (multi-node GPU workloads) on top of Kubernetes and cloud-based orchestration foundation. Build large-scale evaluation frameworks for offline tests, shadow deployments, and A/B experimentation. Implement model/dataset versioning, approvals, lineage tracking, retention, and compliance hooks. Partner with AI/ML research, platform engineering/MLOps and infrastructure, and data engineering teams to generalize workflows into reusable frameworks. Partner with platform engineering/MLOps and infrastructure to define observability stacks for metrics, drift indicators, performance regressions, training/inference health signals, production reliability (SLIs/SLOs), monitoring, and incident response. Desired Background: BS in Computer Science, Mathematics, Engineering, or equivalent technical field. Master's preferred. Proven track record building large-scale distributed systems and integrated data and AI/ML platforms (e.g., training, serving, workflow orchestration, data pipelines). Expert-level proficiency in Python and one of Go/Java/C++ and building production-grade services/APIs/SDKs Extensive hands-on experience with Kubernetes (EKS, GKE, self-hosted, etc) including autoscaling and job scheduling frameworks, GPU infrastructure, and AI/ML-related AWS/GCP managed services (VertexAI, SageMaker, etc). Deep expertise with AI/ML ecosystem and tooling such as PyTorch, TensorFlow, Ray, experiment/feature/model stores (MLFlow, WnB, Feast, etc), Hugging Face Proven ability to scale AI/ML workloads and pipelines—pipeline SDKs, feature/model CI/CD, automated evaluation, safe rollouts, monitoring Strong developer-experience mindset: ability to translate research/engineering friction into elegant APIs, templates, and tools that reduce time-to-first-successful remote run and raise platform adoption. Previous experience with Databricks. Knowledge of multimodal AI/ML (audio, video, text) data preparation, feature extraction, model development, training, and evaluation workflows. Experience with LLM/foundation model sizing/estimation, training requirements, pipelines, and deployment. Knowledge of LLM/foundation model sizing/estimation, training requirements, evaluation workflows and orchestration and deployment patterns. Experience designing feature stores or embedding services tightly integrated with training pipelines.
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in India.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip