Lead AI QA Engineer in Bengaluru, Karnātaka at Kobie Marketing
Explore Related Opportunities
Job Description
About the team and what we will build together
We’re looking for an Lead AI QA Engineer with 6+ years of experience who thrives on designing test strategies and evaluation harnesses for production-grade, agentic AI systems in addition to experience in ETL. You have strong Python skills, hands-on experience testing LLM-powered features (prompt regression, tool/function-call validation, RAG correctness, and structured-output schema checks), and working knowledge of evaluation frameworks such as RAGAS, DeepEval, LangSmith or Langfuse. You are comfortable writing solid SQL, automating tests with PyTest, exercising APIs through Postman or REST clients, and shipping test pipelines using Git, Docker and CI tooling like Jenkins or GitHub Actions.
Kobie runs some of the largest loyalty programs in the world. We are building an internal agent platform on Dataiku that automates analyst workflows, surfaces insights from program data in Snowflake, and gives our teams an LLM-native way to work with complex loyalty logic. As an Lead AI QA Engineer on the India Tech Hub team, you will play a key role in protecting that platform — designing golden datasets, running LLM-as-judge and regression suites, and owning the quality bar for what goes to production. This is not a manual-only role: you will automate, build qa & automation strategies, roadmaps, instrument, monitor and partner closely with our U.S. AI & Innovation team and cross-functional partners across Engineering, Data, AI and Product.
Design and build evaluation harnesses for agentic systems in Python — golden datasets, LLM-as-judge graders, multi-turn regression suites and trace-based assertions. In addition, develop framework to verify generated AI output.
Author automated test suites for prompts, tools, structured outputs (Pydantic / JSON schema), retrieval pipelines (ETL Experience) and end-to-end agent workflows
Validate guardrails around tool execution: auth scoping, input/output validation, PII and prompt-injection protections, and hallucination mitigation
Wire evaluations into CI using Dataiku Evaluations, GitHub Actions or Jenkins so every change is graded against quality, safety and cost SLOs before it ships
Build observability into testing by instrumenting traces with LangSmith, Langfuse, MLflow or OpenTelemetry and triaging production drift back into the eval harness
Own quality end-to-end — define release criteria, run pre-prod and shadow tests, and partner with engineering to root-cause and fix regressions quickly
Partner with data engineers on Snowflake-backed retrieval testing patterns (Cortex Analyst and Cortex Search Services) and with platform teams on observability, security and cost
Help shape internal QA standards for AI & Data engineering as the stack evolves, contributing to design reviews and sharing knowledge across the India and U.S. teams
Participate in a collaborative DevOps environment, working closely with developers, AI engineers, Data Engineers, DBAs and product partners across environments
In your first 90 days
By the end of your first 90 days, you will have stood up at least one production-grade evaluation harness — golden dataset, LLM-as-judge graders and regression suite — wired into CI for an internal agent. You will have automated trace-based assertions running against staging traffic, a clear quality scorecard for at least one shipped agent, and a clear opinion about what our next testing investment should be.
3+ years of professional QA / SDET experience, with production experience automating tests for backend services or data pipelines
1+ years of hands-on experience testing LLM or AI features in production: prompt regression, tool / function-call validation, structured outputs and RAG correctness
Working knowledge of evaluation frameworks such as RAGAS, DeepEval, LangSmith, Langfuse or comparable LLM-as-judge tooling
Strong Python and PyTest skills; solid SQL skills and comfort with at least one cloud platform (AWS, Azure or GCP)
Fluency with Git, Docker, REST APIs and at least one CI tool (GitHub Actions, Jenkins, GitLab CI or CircleCI)
Solid understanding of data security and responsible AI practices, particularly in PCI-compliant or regulated environments
Proven ability to work independently and within a team, managing priorities across concurrent projects and time zones
Strong written and verbal communication skills; able to work effectively with both technical and non-technical stakeholders
A bachelor’s degree is not required — equivalent practical experience (including bootcamps, self-taught work, career changes or non-CS technical degrees) counts
Bonus Skills:
Hands-on experience with Dataiku DSS (Python / SQL recipes, scenarios, code environments, the dataiku and dataikuapi clients) or Dataiku Evaluations
Experience with Dataiku LLM Mesh, Knowledge Banks, Prompt Studio, or Visual / Code Agents
Experience with Snowflake, Snowpark, or Snowflake Cortex (Search, Analyst, Agents)
Experience with red-teaming, prompt-injection testing or adversarial test generation for LLMs
Familiarity with multi-agent patterns: supervisor / router, subagent / handoff, reflection, human-in-the-loop
Experience with performance and load testing tools such as Locust, JMeter or k6
ISTQB, AI Testing or comparable QA certification
Experience in loyalty, martech, adtech or a comparable data-rich B2B domain