Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Human Data Evals Lead at Jobgether?

The Human Data Evals Lead position at Jobgether is a Full-time or part-time position opportunity in the Admin/Clerical/Secretarial field.

Where is this Human Data Evals Lead job located?

United States, Other / Non-US, United States

What type of employment is offered for this Human Data Evals Lead role?

Full-time or part-time position

What is the expected salary for this Human Data Evals Lead job?

Compensation will be discussed during the hiring process.

Human Data Evals Lead job near me in United States, Other / Non-US at Jobgether

Human Data Evals Lead

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Human Data Evals Lead based in United States.

This role sits at the core of frontier AI data operations, owning how high-quality evaluation datasets and benchmarks are designed, validated, and delivered to leading AI labs. You will be responsible for translating ambiguous evaluation needs into structured, high-signal data proposals and production-ready sample packages that demonstrate model performance with rigor and clarity. The work blends technical judgment, quality design, and commercial awareness, requiring close collaboration with subject-matter experts and research stakeholders. You will shape how “frontier-grade” quality is defined and enforced, ensuring every dataset meets the standards expected by advanced model developers. Acting as a key interface with AI lab partners, you will help convert pilots into scaled production engagements. This is a high-ownership role at the intersection of AI evaluation, data quality, and applied research operations.

Accountabilities:

Own the design, development, and delivery of high-quality AI evaluation data initiatives, from initial proposals through pilot execution and production readiness.

Develop data proposals and sample packages based on lab requests, benchmarks, and evaluation targets, translating them into structured, high-signal datasets.
Design frontier-grade evaluation samples across reasoning, coding, agents, tool use, and multimodal tasks, ensuring measurable model discrimination and headroom.
Define and enforce rigorous quality control frameworks, including expert verification, calibration layers, rubrics, and deterministic validation approaches.
Recruit, onboard, and manage subject-matter experts across technical domains, ensuring consistent output quality aligned with benchmark standards.
Own pilot engagements end-to-end, including scoping, staffing, SOW definition, QC execution, and final delivery to AI lab partners.
Act as a key point of contact for lab stakeholders, aligning expectations and surfacing technical requirements in collaboration with internal leadership.
Continuously refine evaluation methodologies and sample design standards to improve signal quality and benchmark reliability.

Requirements:

You are an experienced operator in AI evaluation or technical delivery, with strong expertise in building structured, high-quality data systems for model assessment.

5+ years of experience in technical program management, data operations, quality engineering, or ML evaluation roles.
Proven experience working with AI labs or enterprise ML teams, delivering datasets, benchmarks, or evaluation frameworks.
Strong understanding of LLM evaluation concepts such as benchmarks, rubrics, pass rates, headroom, and model discrimination.
Hands-on experience designing or managing QC processes and ensuring high-quality annotated or evaluated datasets.
Demonstrated ability to recruit, manage, and calibrate subject-matter experts or external contributor pools.
Strong problem-solving skills in ambiguous environments with evolving requirements and fast iteration cycles.
Excellent English communication skills; Spanish is a plus.

Benefits:

Competitive compensation aligned with senior-level AI and data roles
Remote-first setup with flexibility across LATAM and US time zones
Opportunity to work directly with leading AI labs and frontier model development teams
High-ownership role with significant influence over evaluation standards and methodologies
Collaboration with top-tier subject-matter experts across technical domains
Exposure to cutting-edge AI benchmarking and evaluation practices
Fast-paced, research-driven environment with strong learning potential
Opportunity to shape how frontier model quality is measured and improved

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Human Data Evals Lead in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position