JobTarget Logo

Staff Machine Learning Engineer, AI Serving in United States at Jobgether

NewJob Function: Information Technology
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Staff Machine Learning Engineer, AI Serving

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Machine Learning Engineer, AI Serving in the United States.

This role sits at the core of a large-scale machine learning infrastructure organization focused on powering real-time recommendations, content discovery, and generative AI systems at massive scale. You will be responsible for designing and evolving high-performance inference systems that support millions of queries per second with strict latency and reliability requirements. The position combines deep systems engineering with advanced ML deployment, spanning GPU-based model serving, Kubernetes orchestration, and distributed cloud infrastructure. You will play a key role in shaping how large models and LLMs are served efficiently in production environments. Working in a highly collaborative and technically advanced team, you will influence platform architecture that directly impacts user experience, ranking systems, and AI-driven features. This is a high-impact engineering role where scalability, performance, and reliability are central to success.

Accountabilities:
  • Lead the design, development, and maintenance of a large-scale ML inference platform supporting low-latency, high-throughput model serving for search, ranking, and generative AI workloads.
  • Architect and implement GPU-based serving systems capable of handling millions of queries per second with strong reliability and performance guarantees.
  • Build and optimize end-to-end inference pipelines, including routing, caching, batching, and feature processing systems.
  • Develop and maintain model export frameworks to convert trained models into optimized formats for efficient GPU inference.
  • Design and improve observability systems for real-time monitoring of model performance, system health, and feature behavior.
  • Lead efforts in benchmarking, performance tuning, and scalability improvements across multi-cluster cloud environments.
  • Collaborate with cross-functional ML, infrastructure, and product teams to support production deployment of large-scale ML and LLM systems.
Requirements
  • 7+ years of experience in Machine Learning Engineering, AI Platform Engineering, or large-scale distributed systems development.
  • Strong experience operating and scaling Kubernetes-based infrastructure in production environments.
  • Deep knowledge of ML serving systems, inference pipelines, and production-grade AI deployment.
  • Strong programming skills in Python and/or Go, with experience in building scalable backend or ML systems.
  • Hands-on experience with modern ML/AI frameworks and tooling such as PyTorch, Triton, vLLM, or similar technologies.
  • Experience with cloud platforms (AWS, GCP) and infrastructure tooling such as Terraform or equivalent.
  • Strong understanding of observability, monitoring, and performance tuning for real-time systems.
  • Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders.
  • Strong ownership mindset with a focus on scalability, reliability, and developer experience.
Benefits
  • Competitive compensation package with base salary, equity (RSUs), and potential performance-based incentives.
  • Comprehensive healthcare coverage including medical, dental, and vision insurance.
  • Retirement plan with employer matching contributions.
  • Flexible remote-first work environment.
  • Generous paid time off, including vacation, holidays, and volunteer days.
  • Paid parental leave and family support programs.
  • Mental health support, coaching, and wellness resources.
  • Learning and development support for professional growth.
  • Additional benefits covering workspace support, caregiving, and family planning.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.