Senior Software Engineer, AI Compute in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Software Engineer, AI Compute in United States.
This role sits at the core of a high-impact AI infrastructure organization responsible for building and operating large-scale GPU and Kubernetes-based platforms that power advanced machine learning workloads. You will play a key role in shaping the systems that enable efficient, secure, and scalable AI compute across the organization. Working at the intersection of infrastructure, platform engineering, and machine learning, you will help design and evolve the foundation that supports next-generation AI experiences. The position involves deep technical ownership of distributed systems and close collaboration with ML and infrastructure teams. You will influence architectural decisions, improve developer experience, and drive reliability and cost efficiency across GPU fleets. This is a highly visible role where your work directly impacts global-scale AI capabilities and platform performance.
- Provide technical leadership across high-impact infrastructure and AI compute initiatives
- Design, build, test, deploy, and maintain a Kubernetes-based GPU platform at scale
- Improve reliability, scalability, security, and cost efficiency of GPU and cloud infrastructure systems
- Drive execution of multi-year strategy for AI compute platform evolution and modernization
- Collaborate across ML, infrastructure, and platform teams to align goals and technical direction
- Lead and influence distributed engineering teams through coaching and technical guidance
- Manage project priorities, timelines, and deliverables across complex, multi-team initiatives
- Enhance developer experience for ML engineers through platform improvements and tooling
- BS, MS, or PhD in Computer Science or related field, or equivalent practical experience
- 5+ years of experience in infrastructure, platform engineering, or distributed systems roles
- 2+ years of hands-on experience with public cloud platforms (AWS, GCP, or Azure)
- Strong Kubernetes experience is required in production-scale environments
- Proven ability to lead technical direction and deliver large-scale, cross-team projects
- Strong understanding of system reliability, scalability, and performance optimization
- Experience working with or around ML infrastructure is preferred (LLMs, tuning, optimization)
- Excellent communication and leadership skills with ability to influence technical stakeholders
- Strong focus on developer experience, operational excellence, and system efficiency
- Competitive base salary ranging from $191,000 to $225,000 USD
- Eligibility for bonus, equity, and employee travel credits
- Remote-eligible role within the United States (with location eligibility requirements)
- Comprehensive health, wellness, and employee support benefits
- Strong focus on flexibility and work-life balance
- Opportunity to work on large-scale AI and GPU infrastructure systems
- High-impact role with significant technical ownership and visibility
- Inclusive and diverse workplace culture supporting belonging and innovation