G

AI Evaluation Engineer (Knowledge & Research)

icon briefcase Job Type : Full Time
icon remote-alt Remote / Work from Home

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - AI Evaluation Engineer (Knowledge & Research)

About Us

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role overview

We are looking for an AI Evaluation Engineer with a strong research background to design and evaluate complex, multi-agent tasks used to benchmark next-generation AI systems.

In this role, you will work at the intersection of research, data structuring, and AI evaluation, building high-quality tasks that require deep document understanding, structured reasoning, and multi-step synthesis. You will create datasets and evaluation frameworks that test whether AI agents can truly read, reason, and extract knowledge from large-scale unstructured data.

This is a high-precision, detail-oriented role requiring strong analytical thinking, structured problem decomposition, and the ability to translate research content into measurable evaluation tasks.

Commitments Required: 8 hours per day with an overlap of 4 hours with PST.

Employment type: Contractor assignment (no medical/paid leave)

Duration of contract: 5 weeks+

Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam

Interview: take home assessment (60min)

Responsibilities

  • Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections
  • Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
  • Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
  • Design LLM judge prompts that evaluate agent output field-by-field against the oracle
  • Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis)
  • 5+ years of experience in research (academic or industry) in a scientific, technical, or analytical domain
  • Strong ability to read, analyze, and extract structured information from unstructured documents
  • Experience designing or working with structured data formats (JSON, schemas, validation)
  • Proficiency in Python scripting (data processing, validation, or evaluation scripts)
  • Experience with AI evaluation, coding benchmarks, or structured reasoning tasks (e.g., SWE-bench, Terminal-bench, or similar)
  • Experience working with Docker (building images, debugging containers)
  • Strong attention to detail, especially when defining exact, verifiable outputs
  • Ability to design complex, multi-step problem-solving workflows
Original job AI Evaluation Engineer (Knowledge & Research) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to AI Evaluation Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar AI Evaluation Engineer Jobs in Nigeria

GrabJobs is the no1 job portal in Nigeria, connecting you to thousands of jobs fast! Find the best jobs in Nigeria, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.