Software Engineer - RL Environments

Salary :

$200,000 - 500,000 yearly

Company : AfterQuery

Job Type : Full Time

San Francisco, United States

Number of Applicants

000+

Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Software Engineer - RL Environments

About AfterQuery

AfterQuery builds the training data and evaluation infrastructure that frontier AI labs use to make their models better. We work with the world's leading labs to design high signal datasets and run rigorous evaluations that go beyond static benchmarks. We are a small, early team (post Series A) where individual contributors have a direct impact on how the next generation of models learn and improve.

The Role

As a SWE (Environments), you will design the datasets and evaluation rubrics that directly influence how frontier models learn. You'll work hands-on with research teams at top AI labs, experimenting with data collection strategies, diagnosing model failure modes, and developing the metrics that determine whether a model is actually improving. You'll go from hypothesis to live experiment quickly, and your output will feed directly into model training runs at scale.

Day to day, you will design data slices that expose meaningful failure modes across domains like finance, code, and enterprise workflows. You will build and refine reward signals for RLHF and RLVR pipelines. You will develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on alignment and capability. You will partner with lab research teams to translate their training objectives into concrete data and evaluation specifications.

What You'll Do

Design data slides and explore data shapes that expose meaningful model failure modes across domains like finance, code, and enterprise workflows
Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines
Model annotator behavior and run experiments to improve different model capabilities
Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability
Create and manage both real world & synthetic data pipelines
Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications

What We're Looking For

1-4 YOE
Major plus if they've worked for/interned for any RL environment companies in the past or any AI safety or benchmarking orgs like METR, Artificial Analysis, etc..
Genuine obsession with how data structure, selection, and quality drive model behavior
Ability to design lightweight experiments, move fast, and extract actionable insights from messy results
Former founders and early engineers at early stage startups are a plus. We don't filter on pedigree. We want people who can demonstrate they work hard, learn fast, and care deeply about getting the details right.

Compensation Structure:

$200k base + profit share (around 150% of base) + competitive equity

Original job Software Engineer - RL Environments posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Apply Now

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Software Engineer Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Software Engineer Jobs in the US

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Software Engineer - RL Environments

Job Description - Software Engineer - RL Environments

Similar Software Engineer Jobs in the US

Mobile Apps