Descrição do Emprego - Human Data Evals Lead

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Human Data Evals Lead based in Brazil.

This role sits at the core of frontier AI data operations, owning how high-quality evaluation datasets and benchmarks are designed, validated, and delivered to leading AI labs. You will be responsible for translating ambiguous evaluation needs into structured, high-signal data proposals and production-ready sample packages that demonstrate model performance with rigor and clarity. The work blends technical judgment, quality design, and commercial awareness, requiring close collaboration with subject-matter experts and research stakeholders. You will shape how “frontier-grade” quality is defined and enforced, ensuring every dataset meets the standards expected by advanced model developers. Acting as a key interface with AI lab partners, you will help convert pilots into scaled production engagements. This is a high-ownership role at the intersection of AI evaluation, data quality, and applied research operations.

Accountabilities:

Own the design, development, and delivery of high-quality AI evaluation data initiatives, from initial proposals through pilot execution and production readiness.

Develop data proposals and sample packages based on lab requests, benchmarks, and evaluation targets, translating them into structured, high-signal datasets.

Design frontier-grade evaluation samples across reasoning, coding, agents, tool use, and multimodal tasks, ensuring measurable model discrimination and headroom.

Define and enforce rigorous quality control frameworks, including expert verification, calibration layers, rubrics, and deterministic validation approaches.

Recruit, onboard, and manage subject-matter experts across technical domains, ensuring consistent output quality aligned with benchmark standards.

Own pilot engagements end-to-end, including scoping, staffing, SOW definition, QC execution, and final delivery to AI lab partners.

Act as a key point of contact for lab stakeholders, aligning expectations and surfacing technical requirements in collaboration with internal leadership.

Continuously refine evaluation methodologies and sample design standards to improve signal quality and benchmark reliability.

Requirements:

You are an experienced operator in AI evaluation or technical delivery, with strong expertise in building structured, high-quality data systems for model assessment.

5+ years of experience in technical program management, data operations, quality engineering, or ML evaluation roles.

Proven experience working with AI labs or enterprise ML teams, delivering datasets, benchmarks, or evaluation frameworks.

Strong understanding of LLM evaluation concepts such as benchmarks, rubrics, pass rates, headroom, and model discrimination.

Hands-on experience designing or managing QC processes and ensuring high-quality annotated or evaluated datasets.

Demonstrated ability to recruit, manage, and calibrate subject-matter experts or external contributor pools.

Strong problem-solving skills in ambiguous environments with evolving requirements and fast iteration cycles.

Excellent English communication skills; Spanish is a plus.

Benefits:

Competitive compensation aligned with senior-level AI and data roles

Remote-first setup with flexibility across LATAM and US time zones

Opportunity to work directly with leading AI labs and frontier model development teams

High-ownership role with significant influence over evaluation standards and methodologies

Collaboration with top-tier subject-matter experts across technical domains

Exposure to cutting-edge AI benchmarking and evaluation practices

Fast-paced, research-driven environment with strong learning potential

Opportunity to shape how frontier model quality is measured and improved

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Original job Human Data Evals Lead posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Human Data Evals Lead

Descrição do Emprego - Human Data Evals Lead

Accountabilities:

Requirements:

Benefits:

Similar Human Data Evals Lead Jobs in Brazil

Aplicativos de Celular