H

ML Ops Engineer

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
icon loader

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - ML Ops Engineer

 


Our Company


We’re Hitachi Digital Services, a global digital solutions and transformation business with a bold vision of our world’s potential. We’re people-centric and here to power good. Every day, we future-proof urban spaces, conserve natural resources, protect rainforests, and save lives. This is a world where innovation, technology, and deep expertise come together to take our company and customers from what’s now to what’s next. We make it happen through the power of acceleration.


Imagine the sheer breadth of talent it takes to bring a better tomorrow closer to today. We don’t expect you to ‘fit’ every requirement – your life experience, character, perspective, and passion for achieving great things in the world are equally as important to us.


The team


MLOps L2 Support Engineer to provide 24/7 production support for machine learning (ML) and data pipelines. The role requires on-call support, including weekends, to ensure high availability and reliability of ML workflows. The candidate will work with Dataiku, AWS, CI/CD pipelines, and containerized deployments to maintain and troubleshoot ML models in production.


The role


Key Responsibilities:


Incident Management & Support:



  • Provide L2 support for MLOps production environments, ensuring uptime and reliability.

  • Troubleshoot ML pipelines, data processing jobs, and API issues.

  • Monitor logs, alerts, and performance metrics using Dataiku, Prometheus, Grafana, or AWS tools such CloudWatch.

  • Perform root cause analysis (RCA) and resolve incidents within SLAs.

  • Escalate unresolved issues to L3 engineering teams when needed.


 Dataiku Platform Management:



  • Manage Dataiku DSS workflows, troubleshoot job failures, and optimize performance.

  • Monitor and support Dataiku plugins, APIs, and automation scenarios.

  • Collaborate with Data Scientists and Data Engineers to debug ML model deployments.

  • Perform version control and CI/CD integration for Dataiku projects.


 Deployment & Automation:



  • Support CI/CD pipelines for ML model deployment (Bamboo, Bitbucket etc).

  • Deploy ML models and data pipelines using Docker, Kubernetes, or Dataiku Flow.

  • Automate monitoring and alerting for ML model drift, data quality, and performance.


 Cloud & Infrastructure Support:



  • Monitor AWS-based ML workloads (SageMaker, Lambda, ECS, S3, RDS).

  • Manage storage and compute resources for ML workflows.

  • Support database connections, data ingestion, and ETL pipelines (SQL, Spark, Kafka).


 Security & Compliance:



  • Ensure secure access control for ML models and data pipelines.

  • Support audit, compliance, and governance for Dataiku and MLOps workflows.

  • Respond to security incidents related to ML models and data access.


 What you’ll bring


Experience: 10+ years in MLOps, Data Engineering, or Production Support.
Dataiku DSS: Strong experience in Dataiku workflows, scenarios, plugins, and APIs.
Cloud Platforms: Hands-on experience with AWS ML services (SageMaker, Lambda, S3, RDS, ECS, IAM).
CI/CD & Automation: Familiarity with GitHub Actions, Jenkins, or Terraform.
Scripting & Debugging: Proficiency in Python, Bash, SQL for automation & debugging.
Monitoring & Logging: Experience with Prometheus, Grafana, CloudWatch, or ELK Stack.
Incident Response: Ability to handle on-call support, weekend shifts, and SLA-based issue resolution.


Preferred Qualifications:


Containerization: Experience with Docker, Kubernetes, or OpenShift.
ML Model Deployment: Familiarity with TensorFlow Serving, MLflow, or Dataiku Model API.
Data Engineering: Experience with Spark, Databricks, Kafka, or Snowflake.
ITIL/DevOps Certifications: ITIL Foundation, AWS ML certifications; Dataiku certification


 Work Schedule & On-Call Requirements:



  • Rotational on-call support (including weekends and nights).

  • Shift-based monitoring for ML workflows and Dataiku jobs.

  • Flexible work schedule to handle production incidents and critical ML model failures.


About us


We’re a global, team of innovators. Together, we harness engineering excellence and passion to co-create meaningful solutions to complex challenges. We turn organizations into data-driven leaders that can make a positive impact on their industries and society. If you believe that innovation can bring a better tomorrow closer to today, this is the place for you.


#LI-RS2


 

Original job ML Ops Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Share Job
Share Job

Auto-Apply to ML Ops Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar ML Ops Engineer Jobs in the US

GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast! Find the best jobs in the US, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.