Responsibilities
- Build, curate, and maintain agentic workflows along with datasets for training and evaluation.
- Run systematic LLM evaluations, track regressions, and ensure models meet quality bars.
- Define and implement LLM performance metrics(e.g., correctness, latency, hallucination control, safety).
- Work closely with Product and Customer-facing teams, experiment with prompting techniques, fine-tuning datasets, retrieval strategies, and model configurations and ensure agent behaviour aligns with user expectations.
- Develop internal tooling to speed up evaluation, annotation, and iteration cycles.
- Build automated pipelines for regression checks and model monitoring.
- Create mechanisms that turn real user interactions into actionable model improvements.
- Ensure agents behave consistently across large-scale production scenarios.
- Debug complex system behaviours spanning prompts, tools, APIs, and model responses.
- To understand real-world use cases and failure modes. Translate customer insights into model needs, data requirements, and product improvements.
- Tune latency, turn-taking, and conversational naturalness for voice AI systems.
- Write clean, reliable code in Python or JS for model pipelines, tools, and integrations.
- Think in systems: from data ingestion to model outputs to user-facing behaviour.
Success Looks Like
- Strong evaluation coverage with clear, actionable metrics.
- Faster iteration cycles due to improved internal tools and workflows.
- Measurable improvement in agent accuracy, consistency, and safety.
- Reliable agent performance in production — fewer escalations, fewer regressions.
- Clear alignment between customer needs and agent capabilities.
- Smooth collaboration across engineering, product, and customer teams.
What We’re Looking For
- 2–5 years experience in software engineering, with strong fundamentals in building reliable, scalable systems.1–2 years of experience working on AI-backed products(LLM engineering, evaluations, prompting, or agent development).
- Hands-on experience with LLMs, prompt engineering, evaluation frameworks, or dataset creation.
- Strong understanding of how LLMs work: prompting, fine-tuning, evaluation, guardrails, and system design.
- Solid software engineering skills and comfort with APIs and cloud environments.
- Experience building internal tools or automation pipelines.
- Ability to think from the customer backward: empathetic, detail-oriented, and outcome-driven.
- Excellent communicator who collaborates well across cross-functional teams.
- Fast learner who loves turning cutting-edge research into practical, dependable systems.
- Bonus: experience with voice AI, speech technologies, or real-time agent systems.
Why This Role Matters
- In healthcare, accuracy and trust are everything. Patients, providers, and payers rely on our AI agents to handle critical interactions. Deployment Engineers make this possible by turning AI from a raw model into something practical, reliable, and human-friendly.
Why 100ms.ai
- You'll be part of a small team at a fast-growing engineering-first startup.
- You'll work with engineers across the globe with experience in video at places like Facebook and Hotstar.
- You can grow as an individual contributor or as a team leader - freedom to set your own goals.