AI Benchmark Engineer | Native Language Specialist at Jobgether – India
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Benchmark Engineer | Native Language Specialist in India.
This role sits at the intersection of software engineering, language expertise, and AI evaluation, focusing on building rigorous benchmarks that test the multilingual capabilities of large language models. You will design and develop terminal-based tasks that reveal how AI systems handle non-English inputs, encoding challenges, and locale-specific behaviors in real-world coding environments. The work is highly experimental and research-oriented, requiring both technical depth and linguistic precision. You will create realistic multilingual datasets, identify model failure points, and help define robust evaluation standards. Collaboration happens in a distributed, quality-focused environment with strong emphasis on accuracy, reproducibility, and structured validation. Your contributions will directly shape how next-generation AI systems are measured and improved globally.
- Design and engineer high-quality Terminal-Bench tasks to evaluate multilingual performance of AI coding agents in realistic environments.
- Create and maintain multilingual datasets and file-based assets in your native language, ensuring linguistic integrity without translation simplification.
- Identify AI failure points in non-English prompts and workflows, and design challenges that rigorously test robustness.
- Develop reference implementations and deterministic verifier scripts to ensure reliable and reproducible evaluation outputs.
- Calibrate task difficulty levels (Easy to Very Hard) based on model performance analysis and execution logs across different AI tiers.
- Participate in structured multi-layer quality assurance processes, including creation review, calibration validation, and audit checks.
- Ensure benchmark fairness, grammatical accuracy, and technical integrity through both manual review and automated validation systems.
- 5+ years of professional experience in software engineering or a related technical field.
- Background working with leading technology companies or strong academic foundation from top-tier engineering institutions.
- Native or near-native fluency in a language other than English, with deep understanding of grammar, structure, and contextual usage.
- Strong proficiency in Python, shell scripting, and data processing workflows.
- Hands-on experience with CLI/terminal-based development environments and familiarity with coding agents or AI-assisted tools.
- Strong understanding of multilingual computing challenges, including Unicode handling, encoding/decoding, and locale-specific behaviors.
- Knowledge of text processing edge cases such as bidirectional scripts, collation rules, non-Gregorian formats, and rendering constraints.
- High English proficiency for collaboration, documentation, and technical communication.
- Fully remote freelance engagement with flexible working hours and complete autonomy.
- Competitive compensation with fast and reliable payment processing.
- Opportunity to contribute to cutting-edge AI research and multilingual evaluation systems.
- Access to globally distributed projects across AI, language technology, and software engineering domains.
- Collaboration with a diverse international community of linguists, engineers, and AI researchers.
- Continuous exposure to advanced AI systems, enhancing technical and linguistic expertise.
- Streamlined onboarding and project participation process tailored to expert contributors.