Principal Software Engineer, ML Infrastructure at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal Software Engineer, ML Infrastructure in United States.
This role offers the opportunity to lead the design and development of high-performance, large-scale infrastructure that supports advanced machine learning and autonomous systems. You will set the technical vision for critical platforms that process massive datasets, orchestrate compute resources, and enable hundreds of engineers to build next-generation models. Working in a fast-paced, collaborative environment, you will influence architectural decisions, mentor peers, and contribute hands-on to complex, distributed systems. Your work will directly impact the reliability, scalability, and performance of mission-critical software, providing a foundation for cutting-edge AI applications. This position blends technical leadership with deep engineering expertise, allowing you to solve challenging problems at scale.
- Architect, design, and lead the development of scalable, distributed ML infrastructure and services on Kubernetes
- Set the technical roadmap and make high-impact architectural decisions that accelerate AI and autonomous system development
- Mentor and guide engineers, fostering technical growth through design reviews, pair programming, and direct feedback
- Build and deploy complex infrastructure components, from high-throughput data processing pipelines to core compute orchestration
- Ensure reliability, scalability, and maintainability of ML platforms in production environments
- Collaborate with cross-functional teams to integrate infrastructure with ML workflows, enhancing overall system efficiency
- 7+ years of professional software engineering experience
- BS, MS, or PhD in Computer Science or a related technical field
- Extensive hands-on experience with Kubernetes and designing/operating large-scale distributed systems
- Strong proficiency in Python, Go, or similar programming languages
- Experience with major cloud platforms (AWS, GCP, Azure) and deploying production-grade services
- Proven track record of leading complex, cross-functional projects from concept to production
- Excellent problem-solving, analytical, and communication skills for collaborative engineering environments
- Competitive base salary range: $200,000 - $275,000 USD
- Potential for additional compensation such as bonuses or company equity
- Comprehensive health, dental, and vision coverage, including life and pet insurance
- 401(k) plan with company match and Health Savings Account (HSA) options
- Paid vacation, sick leave, and company holidays
- Professional development support including training and certifications
- Collaborative, inclusive, and mission-driven work environment