Job Description - Senior Manager, Data & AI Platform
Set a multi‑year platform roadmap that integrates data pipelines, AI/ML workflows, and compute orchestration layers across cloud and on‑prem infrastructure — ensuring seamless experience spanning dataset creation to model training, evaluation, and deployment. Build a unified platform that abstracts complex underlying engines (Spark, Ray, Kubernetes, Airflow/Flyte/Metaflow, etc.) to provide secure, governed, cohesive workflows Deliver robust APIs, SDKs and developer friendly tools and interfaces (CLIs, GUIs, dashboards, etc) to enable AI/ML researchers and engineers to build repeatable, modular, composable end‑to‑end workflows/pipelines — from dataset selection and preparation through model training, evaluation, tuning, and deployment. Ensure that all stages of the AI/ML lifecycle — dataset selection, feature engineering, training, evaluation, tuning and AI/ML artifact management — can be orchestrated via consistent, versioned workflows with full lineage and reproducibility Provide guardrails and governance around data and AI/ML models while enabling researchers to move from exploratory work to production-level pipelines with minimal friction by leveraging common evaluation frameworks and automated checks Partner with platform engineering/MLOps and infrastructure to define observability stacks for metrics, drift indicators, performance regressions, training/inference health signals, production reliability (SLIs/SLOs), monitoring, and incident response. Hire, mentor, and develop architects and senior engineers across data infrastructure and developer experience Desired Background BS in Computer Science, Mathematics, Engineering, or equivalent technical field. Master's Highly preferred. Proven platform leadership delivering integrated Data and AI/ML platforms used at enterprise scale, with deep experience integrating data processing, orchestration, and AI/ML frameworks and tools to deliver consistent, standardized workflows and pipelines Strong API/SDK design experience enabling AI/ML researchers and engineers to build reliable, reusable, and scalable pipelines Deep knowledge and extensive experience with GPU infrastructure, distributed systems, orchestration engines, AI/ML toolchains, query engines, pipeline technologies i.e., Ray, Spark, Metaflow/Flyte/Airflow/Argo/Kubeflow, Kubernetes. Deep expertise across the AI/ML ecosystem tooling such as PyTorch/TensorFlow/Hugging Face, MLOps stack: experiment tracking, feature stores, model registries/lineage (MLFlow, WnB, Feast, etc), and AI/ML-related AWS/GCP managed services (VertexAI, SageMaker, etc) Excellent communication & partnership skills, able to translate complex system design into intuitive developer-facing abstractions Experience designing feature stores or embedding services tightly integrated with training pipelines.
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in India.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip