AI Engineer, Evaluation in India at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Engineer, Evaluation in India.
This role sits at the core of AI model quality and reliability, focusing on building the systems that evaluate and validate machine learning performance at scale. You will design and implement robust evaluation frameworks that ensure AI models meet strict accuracy, safety, and consistency standards across production environments. Working closely with ML engineers and data teams, you will define benchmarks, automate testing pipelines, and develop dashboards that provide clear visibility into model behavior. Your work will directly influence how AI systems are trusted, deployed, and improved across enterprise use cases. This is a highly analytical and engineering-driven environment where precision, experimentation, and iteration are essential. The role offers strong exposure to large-scale AI systems and real-world model evaluation challenges.
In this role, you will be responsible for building and maintaining the infrastructure that evaluates AI model performance and ensures continuous improvement across releases.
- Design and implement automated evaluation pipelines to assess AI model quality, accuracy, and reliability.
- Develop task-specific benchmarks, test suites, and comparison frameworks for model iteration analysis.
- Build dashboards and reporting systems to track performance metrics such as accuracy, regression, safety, and consistency.
- Create automated regression testing frameworks integrated into CI/CD pipelines for every model update.
- Analyze evaluation outputs to identify model failure modes and communicate insights to ML engineering teams.
- Maintain evaluation datasets including versioning, validation, and coverage analysis for robust testing.
- Support A/B testing frameworks used for production model validation and experimentation.
This position requires strong software engineering skills combined with a solid understanding of machine learning evaluation principles and data-driven analysis.
- 3+ years of software engineering experience, preferably in data or ML-focused environments.
- Strong proficiency in Python for data processing, automation, and tooling development.
- Experience with statistical analysis and data visualization techniques.
- Understanding of ML evaluation metrics such as precision, recall, F1 score, and human evaluation methods.
- Experience building automated test frameworks and working with CI/CD pipelines.
- Familiarity with dashboarding and monitoring tools for performance tracking.
- Strong analytical thinking, problem-solving ability, and attention to detail.
- Bachelor’s degree in Computer Science, Statistics, or a related field, or equivalent practical experience.
- Competitive compensation aligned with experience and skills
- Opportunity to work on large-scale AI systems impacting enterprise customers globally
- Exposure to cutting-edge machine learning evaluation and AI quality frameworks
- Collaborative engineering culture focused on experimentation and innovation
- Career growth in advanced AI, ML systems, and evaluation infrastructure
- Inclusive and supportive work environment promoting equal opportunity
- Flexible and modern work practices depending on team structure and project needs