ML Infrastructure / MLOps Engineer
Experience: 3–7 years | MLOps/ML platform +
Kubernetes
Location: Office – Coimbatore/Bengaluru
About Aivar Innovations
Aivar is an AI -first technology partner where cutting -edge technology meets industry expertise to supercharge your projects. Our AI -augmented teams accelerate development, reduce time -to -market, and deliver exceptional code quality. We bring together the best minds in tech to craft scalable, repeatable solutions that drive real momentum for your business.
Technical Focus
Own the JARK -Stack integration on
EKS: Ray + KubeRay for distributed compute, Kubeflow Pipelines for workflow
orchestration, MLflow for experiment tracking, JupyterHub for development, and
advanced job schedulers (Kueue, Volcano, Argo) for batch training. Bridge
between data scientists and platform.
Functional Expectations
- Deploy and optimize Ray + KubeRay for distributed data
processing and model training across GPU clusters
- Build Kubeflow Pipelines for reproducible ML workflows
— data prep, training, evaluation, deployment with lineage tracking
- Configure MLflow for centralized experiment tracking
and model registry across teams
- Implement advanced job scheduling — queue management,
priority, preemption, gang scheduling via Kueue/Volcano
- Build model CI/CD — automated training, evaluation,
validation, and canary/blue -green deployment to inference endpoints
- Create self -service tooling for data scientists —
cluster provisioning, GPU allocation, experiment templates
- Monitor ML workload performance — GPU utilization,
training throughput, data pipeline efficiency
Must -Have Technical Skills
- ML infrastructure / MLOps / ML platform engineering
(3+ years)
- Kubernetes (EKS preferred) — deployments, PVs, RBAC,
resource management
- At least two of: Ray/KubeRay, Kubeflow, MLflow,
Airflow, Argo Workflows
- Distributed training — PyTorch DDP, Horovod,
DeepSpeed, or Ray Train
- Model serving — KServe, Seldon, or custom FastAPI
serving
- GPU scheduling and resource management on Kubernetes
- Strong Python engineering — tools and automation, not
just notebooks
Core Tech Stack
Ray/KubeRay, Kubeflow Pipelines,
MLflow, JupyterHub, Argo Workflows, Kueue/Volcano, PyTorch/DeepSpeed, KServe,
Helm, AWS (EKS, S3, EFS, ECR), Prometheus/Grafana
Benefits
Why You’ll Love Working at Aivar
- Learn from Experts: Work directly with former AWS leaders and AI pioneers.
- Direct Ownership: Lead high -impact "greenfield" projects from concept to global launch.
- Modern Tech: Master the latest Generative AI frameworks and cloud -native architectures.
- Real -World Impact: Build mission -critical systems used by major global enterprises.
- Rapid Growth: Scale your career quickly in a high -speed
Diversity and Inclusion
Aivar Innovations is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to gender, gender identity, sexual orientation, religion, disability, age, marital status, caste, or any other protected characteristic, and we are committed to building a diverse, inclusive, and respectful workplace for everyone.