AI Data Infrastructure Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Data Infrastructure Engineer in the United States.
This role focuses on designing, building, and operating large-scale data systems that power modern AI training and evaluation workflows. You will work on complex, high-throughput data infrastructures that support multimodal datasets and ensure high-quality data delivery for machine learning pipelines. The position combines deep data engineering expertise with a strong understanding of AI system requirements, including scalability, reliability, and performance optimization. You will contribute to building ingestion, transformation, validation, and dataset management systems that directly influence model quality and training efficiency. Working in a highly technical environment, you will collaborate with ML engineers and researchers to align data architecture with evolving AI needs. This is a hands-on, impactful role ideal for engineers passionate about large-scale systems and cutting-edge AI infrastructure.
- Design, build, and maintain large-scale data pipelines supporting AI training, evaluation, and continuous model improvement workflows.
- Develop ingestion and processing systems for multimodal datasets including text, image, audio, video, and structured data.
- Implement data cleaning, deduplication, validation, and quality assurance processes at petabyte-scale.
- Build dataset versioning, lineage tracking, and reproducibility systems to ensure reliable AI training environments.
- Optimize high-throughput data delivery systems to maximize compute and GPU utilization.
- Collaborate with ML researchers and engineers to support dataset construction, evaluation pipelines, and AI model development needs.
- Design scalable storage architectures and implement observability tools for data quality, performance, and pipeline health.
- Ensure data governance, privacy compliance, and secure handling of sensitive datasets across systems.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- 6+ years of experience in data engineering, preferably supporting machine learning or AI systems.
- Strong proficiency in Python and at least one systems or JVM-based language (e.g., Java, Scala, Go).
- Hands-on experience with distributed data processing frameworks such as Spark, Beam, or Ray.
- Experience operating large-scale or petabyte-level data infrastructure systems.
- Strong understanding of distributed systems, data modeling, storage formats, and pipeline architecture.
- Experience with dataset versioning, lineage tracking, and ML reproducibility workflows.
- Strong software engineering practices including testing, CI/CD, and system design.
- Excellent communication skills and ability to work cross-functionally with technical teams.
- Experience with multimodal datasets, privacy-aware systems, or AI training pipelines is a plus.
- Competitive salary aligned with experience and expertise (W2 employment).
- Full-time, long-term remote position within the United States.
- Comprehensive benefits package (health, dental, vision, and wellness support).
- 401(k) retirement savings plan and financial security programs.
- Paid time off, holidays, and work-life balance support.
- Opportunity to work on cutting-edge AI infrastructure and large-scale data systems.
- Professional growth in advanced AI, distributed systems, and data engineering domains.