Sr. AI Data Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr. AI Data Engineer based in United States.
This role operates at the intersection of data engineering and machine learning systems, building the foundational pipelines that power next-generation generative AI models. You will design and scale complex, AI-augmented data workflows that process billions of images and integrate model-driven enrichment at every stage. The position requires deep expertise in distributed systems, data pipelines, and ML inference orchestration in high-scale environments. You will work on systems that combine traditional SQL-based transformations with real-time model invocations, ensuring quality, reliability, and performance. A key focus of the role is enabling high-quality training datasets for image generation models, directly influencing model performance across multiple dimensions. You will collaborate closely with ML researchers and engineers in a fast-paced, research-driven environment. This is a highly technical and impactful role shaping the future of generative AI infrastructure.
- Design and maintain large-scale, AI-augmented data pipelines that combine SQL transformations with ML model invocations for data cleaning, labeling, and enrichment.
- Own end-to-end remote inference orchestration, including batching, asynchronous execution, retry logic, failure handling, and performance optimization.
- Build and manage scalable embedding pipelines, including vector generation, storage, indexing, and similarity search infrastructure.
- Curate and govern large-scale training datasets for image generation models using model-driven signals such as classifiers, aesthetic scoring, and content filters.
- Develop automated annotation systems using LLMs and vision models, including evaluation frameworks to measure annotation quality and model performance.
- Contribute to shared engineering frameworks and reusable tooling for AI-driven data workflows and pipeline orchestration.
- Ensure pipeline reliability, compliance, and data quality across billions of records in distributed production systems.
- Collaborate with ML researchers and engineers to improve dataset quality, evaluation metrics, and generative model performance.
- Bachelor’s degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
- 5+ years of experience in data engineering, ML engineering, or hybrid roles involving data pipelines and model inference systems.
- Strong expertise in SQL, data pipeline orchestration tools (e.g., Airflow, Dataswarm), and large-scale distributed systems.
- Hands-on experience integrating ML models into production pipelines, including inference APIs, batching, and failure handling.
- Experience with AI-assisted development tools (e.g., Copilot, Cursor, Codex) to accelerate engineering workflows.
- Strong programming and debugging skills with a focus on scalable data systems and production reliability.
- Experience with embeddings, vector databases, or similarity search systems (e.g., FAISS, Milvus) is highly desirable.
- Familiarity with content understanding models such as classifiers, OCR, object detection, and NSFW filtering.
- Exposure to LLM-based workflows for data annotation, cleaning, or evaluation is strongly preferred.
- Knowledge of generative AI concepts such as diffusion models, CLIP scores, and image quality evaluation metrics is a plus.
- Strong communication and collaboration skills in cross-functional technical environments.
- Competitive annual compensation ranging from $105,000 – $110,000.
- Opportunity to work on cutting-edge generative AI infrastructure at massive scale.
- Exposure to advanced ML systems, embeddings, and large-scale model orchestration pipelines.
- Collaborative environment working closely with research and engineering teams.
- Remote flexibility not included; onsite collaboration in a high-performance engineering environment.
- Eligibility for standard contractor or temp employee benefits (medical, dental, vision, 401(k), holidays) depending on employment classification and hours.
- Opportunity to contribute directly to the development of next-generation image generation models.