JobTarget Logo

Sr. AI Data Engineer in United States at Jobgether

NewJob Function: Information Technology
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Sr. AI Data Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr. AI Data Engineer based in United States.

This role operates at the intersection of data engineering and machine learning systems, building the foundational pipelines that power next-generation generative AI models. You will design and scale complex, AI-augmented data workflows that process billions of images and integrate model-driven enrichment at every stage. The position requires deep expertise in distributed systems, data pipelines, and ML inference orchestration in high-scale environments. You will work on systems that combine traditional SQL-based transformations with real-time model invocations, ensuring quality, reliability, and performance. A key focus of the role is enabling high-quality training datasets for image generation models, directly influencing model performance across multiple dimensions. You will collaborate closely with ML researchers and engineers in a fast-paced, research-driven environment. This is a highly technical and impactful role shaping the future of generative AI infrastructure.

Accountabilities:
  • Design and maintain large-scale, AI-augmented data pipelines that combine SQL transformations with ML model invocations for data cleaning, labeling, and enrichment.
  • Own end-to-end remote inference orchestration, including batching, asynchronous execution, retry logic, failure handling, and performance optimization.
  • Build and manage scalable embedding pipelines, including vector generation, storage, indexing, and similarity search infrastructure.
  • Curate and govern large-scale training datasets for image generation models using model-driven signals such as classifiers, aesthetic scoring, and content filters.
  • Develop automated annotation systems using LLMs and vision models, including evaluation frameworks to measure annotation quality and model performance.
  • Contribute to shared engineering frameworks and reusable tooling for AI-driven data workflows and pipeline orchestration.
  • Ensure pipeline reliability, compliance, and data quality across billions of records in distributed production systems.
  • Collaborate with ML researchers and engineers to improve dataset quality, evaluation metrics, and generative model performance.
Requirements:
  • Bachelor’s degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
  • 5+ years of experience in data engineering, ML engineering, or hybrid roles involving data pipelines and model inference systems.
  • Strong expertise in SQL, data pipeline orchestration tools (e.g., Airflow, Dataswarm), and large-scale distributed systems.
  • Hands-on experience integrating ML models into production pipelines, including inference APIs, batching, and failure handling.
  • Experience with AI-assisted development tools (e.g., Copilot, Cursor, Codex) to accelerate engineering workflows.
  • Strong programming and debugging skills with a focus on scalable data systems and production reliability.
  • Experience with embeddings, vector databases, or similarity search systems (e.g., FAISS, Milvus) is highly desirable.
  • Familiarity with content understanding models such as classifiers, OCR, object detection, and NSFW filtering.
  • Exposure to LLM-based workflows for data annotation, cleaning, or evaluation is strongly preferred.
  • Knowledge of generative AI concepts such as diffusion models, CLIP scores, and image quality evaluation metrics is a plus.
  • Strong communication and collaboration skills in cross-functional technical environments.
Benefits:
  • Competitive annual compensation ranging from $105,000 – $110,000.
  • Opportunity to work on cutting-edge generative AI infrastructure at massive scale.
  • Exposure to advanced ML systems, embeddings, and large-scale model orchestration pipelines.
  • Collaborative environment working closely with research and engineering teams.
  • Remote flexibility not included; onsite collaboration in a high-performance engineering environment.
  • Eligibility for standard contractor or temp employee benefits (medical, dental, vision, 401(k), holidays) depending on employment classification and hours.
  • Opportunity to contribute directly to the development of next-generation image generation models.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.