$200,000 - 320,000 yearly
Number of Applicants
:000+
Let AI Supercharge Your Job Hunt!
JobCopilot scans 500,000+ company career sites daily to find jobs for you
Remote in United States, Canada or Latin America • $200K-320K + Equity
Role Summary
Own the quality of OB-1's benchmark suite. Execute tasks with the AI agent, analyze results, identify broken or gamed benchmarks, and curate hundreds of tasks for production. You need deep technical judgment to instantly recognize poor task design.
Core Responsibilities
Task Execution & Analysis (40%): Run OB-1 against tasks. Analyze results. Understand why it succeeds or fails.
Task Design Review (40%): Judge if tasks are well-designed, solvable, and test real capability. Spot what's trivial or can be gamed. Refine as needed.
Curation & Scaling (20%): Filter task batches for quality. Build repeatable curation process as volume scales to 500+.
Required Expertise
Expert-level understanding of 2+ domains: ML systems, C++ performance optimization, or Verilog/chip design
IOI/IMO-level competitive programming background (or similar)
5+ years building production systems
1+ year professional experience with Python and one of Rust or C++
Experience with a Typescript a plus
High bar for quality with ability to articulate why tasks are good or bad
• Competitive compensation: $200,000 - $320,000 base salary plus significant equity
• Opportunity to work on cutting-edge AI technology with real-world impact
• Collaborative environment with a world-class team of engineers and researchers
• Access to state-of-the-art computing resources and AI models
• The chance to shape the future of how software is built
Auto-Apply to Member of Technical Staff Jobs with your AI JobCopilot
Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.