What you will do:
- Lead the design and implementation of training pipelines, automated data workflows, and integration tooling that scale with research demand.
- Build systems for large-scale data collection, preprocessing, and curation to support robust experimentation.
- Create tools that streamline experiment lifecycles, reduce turnaround time, and help move models toward production smoothly.
- Collaborate closely with ML researchers to remove technical blockers and improve developer experience.
- Support model serving pipelines and integrate ML components with broader platform systems.
What you will need:
- 3+ years experience building production-grade machine learning systems, data infrastructure, or research platforms.
- Deep hands-on expertise with Python and at least one systems language (e.g., C++, Go, Rust, Java).
- Experience working with PyTorch or TensorFlow in production or research environments.
- Proven track record with ML training pipelines, data workflows, and integration tooling.
- Familiarity with model deployment and inference optimization (MLOps patterns).
Nice-to-haves:
- GPU-accelerated computing, distributed training systems, data versioning or experiment tracking tools
- Docker/Kubernetes exposure
- Contributions to open-source ML projects.