Number of Applicants
:000+
Let AI Supercharge Your Job Hunt!
JobCopilot scans 500,000+ company career sites daily to find jobs for you
We build the evaluation datasets and RL environments that make AI reliable in domains where mistakes are expensive: finance, healthcare, and legal. Our team designs expert-curated training data, calibrated rubrics, and verifiable task environments for AI labs and startups pushing the frontier of what models can do in regulated industries.
We're a small, lean, London based team that moves fast and takes the work seriously. Everyone contributes directly. Initiative is rewarded, and ownership is the default. If you want to shape how frontier AI learns to operate in the real world, we'd like to hear from you.
As a Member of Technical Staff on our Applied AI team, you will build the tasks and environments that AI labs use to train and evaluate their agents in finance, healthcare, and legal.
Day to day, that looks like: constructing RL environments around spreadsheets, documents, and professional workflows. Writing verification logic and reward functions. Working with domain experts to scope what a correct answer actually looks like in an LBO model or a clinical note. Some days it's engineering, some days it's closer to research. The common thread is that you're producing the ground truth that frontier models get measured against.
Build RL environments across finance, healthcare, and legal domains
Assist in designing tasks with golden answers, calibrated rubrics, and programmatic reward signals
Write verification logic and reward functions that can distinguish good model outputs from bad ones
Work directly with domain experts (investment analysts, physicians, attorneys) to translate complex professional workflows into structured tasks
Prototype new approaches to evaluation, verification, and synthetic data generation
Practical experience building with LLMs: prompting, evaluation, and agentic harnesses. You've built things that actually run, not just notebooks.
High agency and technically sharp. You don't wait for permission, specs, or a roadmap. You see what needs doing, figure out how, and get it done.
Comfortable working across very different contexts. The job moves between engineering, evaluation design, and deep collaboration with domain experts often in the same day.
You ship and iterate. Small team, no room for work that sits in review. Bias toward getting something working, learning from it, and improving it.
You own problems end to end, from scoping with a domain expert through to a working environment. If you prefer clearly partitioned tickets, this probably isn't the right fit.
Already using LLMs as part of how you build, not just as the thing you're building for.
Domain knowledge in finance, healthcare, or legal
Familiarity with RL concepts, model training, and post-training workflows
Cloud infrastructure experience (AWS or GCP)
Previous startup experience, especially as an early engineer
Auto-Apply to Member of Technical Staff Jobs with your AI JobCopilot
Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.