You will like this opportunity if you're passionate about using empirical research to make AI systems safer in practice. You enjoy the challenge of translating theoretical AI risks into concrete detection mechanisms. You thrive on rapid iteration and learning from data. You want your research to directly impact real-world AI safety.
Research & Development
- Systematically collect and catalog coding agent failure modes from real-world instances, our internal deployments, public examples, research literature, and theoretical predictions
- Design and conduct experiments to test monitor effectiveness across different failure modes and agent behaviors
- Build and maintain evaluation frameworks to measure progress on monitoring capabilities
- Build and maintain high-quality datasets to train and test monitors on
- Iterate on monitoring approaches based on empirical results, balancing detection accuracy with computational efficiency
- Stay current with research on AI safety, agent failures, and detection methodologies
- Stay current with research into coding security and safety vulnerabilities
Monitor Design & Optimization
- Develop & maintain a comprehensive library of monitoring prompts tailored to specific failure modes (e.g., security vulnerabilities, goal misalignment, deceptive behaviors)
- Experiment with different reasoning strategies and output formats to improve monitor reliability
- Design and test hierarchical monitoring architectures and ensemble approaches
- Optimize log pre-processing pipelines to extract relevant signals while minimizing latency and computational costs
- Implement and evaluate different scaffolding approaches for monitors, including chain-of-thought reasoning, structured outputs, and multi-step verification
Fine-tuning & Red-teaming
- Fine-tune open-source models to create efficient monitors for high-volume production environments
- Design and build agentic monitoring systems that autonomously investigate logs to identify both known and novel failure modes
- Build automated red-teaming pipelines that attack monitors at scale
- Design iterative adversarial games where a red-team and blue team continuously attack and defend respectively
JOB REQUIREMENTS
- 2+ years of experience conducting empirical research with large language models or AI systems
- Strong experience with AI coding agents. For example, having extensively used and compared frontier coding agents, or having designed / developed coding agents
- Experience with LLM-as-a-judge setups or AI monitoring more broadly
- Experience designing and running experiments, analyzing results, and iterating based on empirical findings e.g. prompting, scaffolding, agent design, fine-tuning, or RL
- Strong Python programming skills
- Demonstrated ability to work independently on open-ended research problems
- Familiarity with AI safety concepts, particularly agent-related risks
- Familiarity with computer security, e.g. security testing and secure system design
- Experience fine-tuning language models or working with smaller open-source models
- Previous work building developer tools or monitoring systems
- Publications or contributions to AI safety or ML research
- Experience with production log systems or production log analysis
WHAT YOU'LL ACCOMPLISH IN YOUR FIRST YEAR
- Build a comprehensive failure mode database: Systematically collect and categorize 100+ distinct AI agent failure modes across safety and security dimensions, creating the foundation for our monitoring library.
- Develop and validate monitoring approaches: Create and empirically test monitoring prompts and strategies for key failure categories, establishing clear metrics for monitor performance and building evaluation frameworks to track progress.
- Optimize the monitoring pipeline: Improve log preprocessing and monitor scaffolding to achieve measurable improvements in detection accuracy, false positive rates, and computational efficiency.
- Advance monitoring capabilities: Work on advanced approaches such as fine-tuned monitors or agentic investigation systems
REPRESENTATIVE PROJECTS
- Hierarchical monitoring for coding agent security: Design a multi-layer monitoring system for detecting security vulnerabilities introduced by coding agents. Start by cataloging common security failure modes (e.g., hardcoded credentials, SQL injection vulnerabilities, insecure API calls). Build specialized monitors for each category, then create a hierarchical system where fast, efficient first-pass monitors flag potentially problematic code for deeper investigation by more sophisticated monitors.
- Design a backtesting strategy for Watcher: Validate the system on synthetic test cases and real agent outputs, iterating to optimize the tradeoff between detection rates and false positives while maintaining low latency for most monitoring decisions.
- Fine-tune an open-source model to be a great monitor: Take an open-source model and fine-tune it on our large dataset of coding agent failures with high-quality ground truth labels. Test different fine-tuning techniques and measure generalization to our held-out data. Compare against prompted baselines on accuracy, cost and latency. The goal is to fine-tune models to lift the pareto frontier of monitors.
BENEFITS
- This role offers market competitive salary, equity, and competitive benefits.
- Salary: 100k - 200 GBP (~135k - 270k USD)
- Flexible work hours and schedule
- Unlimited vacation
- Unlimited sick leave
- Up to 6 months of paid parental leave
- Comprehensive health, dental and vision insurance
- Retirement savings with competitive employer matching (e.g. 401(k) for US employees)
- Lunch, dinner, and snacks are provided for all employees on workdays
- Paid work trips, including staff retreats, business trips, and relevant conferences
- A yearly $1,000 (USD) professional development budget
LOGISTICS
- Time Allocation: Full-time
- Location: This is an in-person role working out of our London or San Francisco office.
- Visa sponsorship: We sponsor visas in both the UK and US. Sponsorship isn't guaranteed for every role or candidate, but if we make you an offer, we'll work with you to find the right visa route.
The rapid rise in AI capabilities offer tremendous opportunities, but also present significant risks. At Apollo Research, we’re primarily concerned with risks from Loss of Control, i.e. risks coming from the model itself rather than e.g. humans misusing the AI. We’re particularly concerned with deceptive alignment / scheming, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight. We work on the detection of scheming (e.g., building evaluations), the science of scheming (e.g., model organisms), and scheming mitigations (e.g., anti-scheming and control). We closely work with multiple frontier AI companies, e.g. to test their models before deployment or collaborate on scheming mitigations. At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.
We're now also developing tools and products (See Watcher) that make it easier to prevent harms from AI systems widely deployed AI systems.
Equality Statement: Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.