Data Scientist - AI Evaluation at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Data Scientist - AI Evaluation in United States.
This role is focused on ensuring the reliability, accuracy, and real-world performance of AI systems that power consumer-facing experiences. The Data Scientist will develop metrics, evaluation frameworks, and experiments to measure how AI models perform across retrieval, ranking, recommendations, and outcomes. Working closely with ML engineers and product teams, this role transforms ambiguous product questions into measurable hypotheses, identifies failure modes, and drives continuous improvement. Success requires strong analytical skills, a deep understanding of AI evaluation, and the ability to translate complex technical insights into actionable product outcomes. The position offers an opportunity to shape how AI effectiveness is measured and trusted across the organization while collaborating in a fast-paced, innovation-driven environment.
- Define, implement, and maintain metrics and scoring frameworks to evaluate AI agent performance across the full shopping experience
- Design and run experiments to measure model improvements, regressions, and user impact
- Build and maintain evaluation datasets, benchmarks, and automated evaluation pipelines
- Translate product and engineering questions into clear, structured hypotheses and measurable analyses
- Identify edge cases, failure modes, and gaps in AI model performance, recommending actionable improvements
- Create dashboards and reporting that make AI system performance visible, trusted, and actionable
- Collaborate closely with ML engineers, product managers, and other stakeholders to guide iteration and validate model changes
Requirements:
- 4–6+ years of experience in data science, AI/ML evaluation, applied AI, or related roles
- Deep expertise in evaluating AI/ML systems such as ranking, recommendation engines, or LLMs
- Strong experience in experimentation methodologies (A/B testing, causal inference, offline and live evaluations)
- Background in consumer products, user-facing systems, or e-commerce/marketplace platforms
- Ability to translate ambiguous, complex problems into structured analyses and actionable metrics
- Strong product mindset, with focus on real user outcomes and measurable impact
- Excellent communication skills with the ability to influence across engineering and product teams
- Proficiency with data analysis, statistical modeling, and evaluation frameworks
Benefits:
- Competitive base salary range of $225,000–$280,000 USD, depending on experience and location
- Equity through stock options
- Comprehensive healthcare coverage (medical, dental, vision)
- 401(k) retirement plan
- Flexible PTO and company holidays
- Fully remote work within the United States
- Periodic company offsites and team gatherings