Manager, Next-Gen AI Cluster Validation at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Manager, Next-Gen AI Cluster Validation in the United States.
This role offers the opportunity to lead the development and validation of next-generation AI supercomputing systems at scale. You will manage a high-performing technical team responsible for integrating compute, networking, storage, and software systems into large-scale AI and HPC clusters. The position involves collaborating closely with internal teams, external partners, and customers to ensure successful deployment and performance of cutting-edge systems. You will design and implement tools, processes, and documentation to support cluster development, automation, and performance engineering. This role blends strategic leadership with hands-on execution in a fast-paced, remote-friendly environment. It is ideal for a technical leader passionate about AI, HPC, and supercomputing innovation.
- Lead a distributed engineering team designing and validating next-generation AI and HPC clusters
- Integrate new compute, networking, storage, and software systems for high-performance applications
- Develop platforms for system automation, software development, and performance optimization
- Build tools and documentation to support large-scale supercomputing system deployment and operations
- Collaborate with internal teams on cluster architecture, integration, and at-scale bring-up
- Partner with external collaborators and customers to support validation of clusters based on reference architectures
- Ensure the team delivers high-quality, scalable, and reliable AI computing solutions
Requirements:
- BS in Applied Science or Engineering; advanced degrees preferred
- 8+ years of experience in high-performance computing, AI, or machine learning environments
- 3+ years of experience in technical leadership roles managing engineering teams
- Proficiency in software development and system automation with languages such as Go, Python, or Ansible
- Proven ability to lead distributed, high-performing teams and foster collaboration
- Strong problem-solving skills and creative thinking in complex technical environments
- Comfortable working in a remote-friendly environment across multiple locations
- Excellent teamwork, communication, and collaboration skills
- Familiarity with AI/ML workloads, cluster architectures, or HPC systems is strongly preferred
Benefits:
- Competitive base salary range: $224,000 – $356,500, plus equity opportunities
- Comprehensive healthcare, dental, and vision coverage
- Flexible paid time off and parental leave programs
- Retirement savings and matching plans
- Professional development and learning opportunities
- Remote-friendly work environment with global collaboration
- Exposure to cutting-edge AI and HPC technologies and high-impact projects