Staff Machine Learning Engineer, AI Serving in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Machine Learning Engineer, AI Serving in the United States.
This role sits at the core of a large-scale machine learning infrastructure organization focused on powering real-time recommendations, content discovery, and generative AI systems at massive scale. You will be responsible for designing and evolving high-performance inference systems that support millions of queries per second with strict latency and reliability requirements. The position combines deep systems engineering with advanced ML deployment, spanning GPU-based model serving, Kubernetes orchestration, and distributed cloud infrastructure. You will play a key role in shaping how large models and LLMs are served efficiently in production environments. Working in a highly collaborative and technically advanced team, you will influence platform architecture that directly impacts user experience, ranking systems, and AI-driven features. This is a high-impact engineering role where scalability, performance, and reliability are central to success.
- Lead the design, development, and maintenance of a large-scale ML inference platform supporting low-latency, high-throughput model serving for search, ranking, and generative AI workloads.
- Architect and implement GPU-based serving systems capable of handling millions of queries per second with strong reliability and performance guarantees.
- Build and optimize end-to-end inference pipelines, including routing, caching, batching, and feature processing systems.
- Develop and maintain model export frameworks to convert trained models into optimized formats for efficient GPU inference.
- Design and improve observability systems for real-time monitoring of model performance, system health, and feature behavior.
- Lead efforts in benchmarking, performance tuning, and scalability improvements across multi-cluster cloud environments.
- Collaborate with cross-functional ML, infrastructure, and product teams to support production deployment of large-scale ML and LLM systems.
- 7+ years of experience in Machine Learning Engineering, AI Platform Engineering, or large-scale distributed systems development.
- Strong experience operating and scaling Kubernetes-based infrastructure in production environments.
- Deep knowledge of ML serving systems, inference pipelines, and production-grade AI deployment.
- Strong programming skills in Python and/or Go, with experience in building scalable backend or ML systems.
- Hands-on experience with modern ML/AI frameworks and tooling such as PyTorch, Triton, vLLM, or similar technologies.
- Experience with cloud platforms (AWS, GCP) and infrastructure tooling such as Terraform or equivalent.
- Strong understanding of observability, monitoring, and performance tuning for real-time systems.
- Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders.
- Strong ownership mindset with a focus on scalability, reliability, and developer experience.
- Competitive compensation package with base salary, equity (RSUs), and potential performance-based incentives.
- Comprehensive healthcare coverage including medical, dental, and vision insurance.
- Retirement plan with employer matching contributions.
- Flexible remote-first work environment.
- Generous paid time off, including vacation, holidays, and volunteer days.
- Paid parental leave and family support programs.
- Mental health support, coaching, and wellness resources.
- Learning and development support for professional growth.
- Additional benefits covering workspace support, caregiving, and family planning.