Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Staff Machine Learning Engineer, AI Serving at Jobgether?

The Staff Machine Learning Engineer, AI Serving position at Jobgether is a Full-time or part-time position opportunity in the Information Technology field.

Where is this Staff Machine Learning Engineer, AI Serving job located?

United States, Other / Non-US, United States

What type of employment is offered for this Staff Machine Learning Engineer, AI Serving role?

Full-time or part-time position

What is the expected salary for this Staff Machine Learning Engineer, AI Serving job?

Compensation will be discussed during the hiring process.

Staff Machine Learning Engineer, AI Serving job near me in United States, Other / Non-US at Jobgether

Staff Machine Learning Engineer, AI Serving

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Machine Learning Engineer, AI Serving in the United States.

This role sits at the core of a large-scale machine learning infrastructure organization focused on powering real-time recommendations, content discovery, and generative AI systems at massive scale. You will be responsible for designing and evolving high-performance inference systems that support millions of queries per second with strict latency and reliability requirements. The position combines deep systems engineering with advanced ML deployment, spanning GPU-based model serving, Kubernetes orchestration, and distributed cloud infrastructure. You will play a key role in shaping how large models and LLMs are served efficiently in production environments. Working in a highly collaborative and technically advanced team, you will influence platform architecture that directly impacts user experience, ranking systems, and AI-driven features. This is a high-impact engineering role where scalability, performance, and reliability are central to success.

Accountabilities:

Lead the design, development, and maintenance of a large-scale ML inference platform supporting low-latency, high-throughput model serving for search, ranking, and generative AI workloads.
Architect and implement GPU-based serving systems capable of handling millions of queries per second with strong reliability and performance guarantees.
Build and optimize end-to-end inference pipelines, including routing, caching, batching, and feature processing systems.
Develop and maintain model export frameworks to convert trained models into optimized formats for efficient GPU inference.
Design and improve observability systems for real-time monitoring of model performance, system health, and feature behavior.
Lead efforts in benchmarking, performance tuning, and scalability improvements across multi-cluster cloud environments.
Collaborate with cross-functional ML, infrastructure, and product teams to support production deployment of large-scale ML and LLM systems.

Requirements

7+ years of experience in Machine Learning Engineering, AI Platform Engineering, or large-scale distributed systems development.
Strong experience operating and scaling Kubernetes-based infrastructure in production environments.
Deep knowledge of ML serving systems, inference pipelines, and production-grade AI deployment.
Strong programming skills in Python and/or Go, with experience in building scalable backend or ML systems.
Hands-on experience with modern ML/AI frameworks and tooling such as PyTorch, Triton, vLLM, or similar technologies.
Experience with cloud platforms (AWS, GCP) and infrastructure tooling such as Terraform or equivalent.
Strong understanding of observability, monitoring, and performance tuning for real-time systems.
Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders.
Strong ownership mindset with a focus on scalability, reliability, and developer experience.

Benefits

Competitive compensation package with base salary, equity (RSUs), and potential performance-based incentives.
Comprehensive healthcare coverage including medical, dental, and vision insurance.
Retirement plan with employer matching contributions.
Flexible remote-first work environment.
Generous paid time off, including vacation, holidays, and volunteer days.
Paid parental leave and family support programs.
Mental health support, coaching, and wellness resources.
Learning and development support for professional growth.
Additional benefits covering workspace support, caregiving, and family planning.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Staff Machine Learning Engineer, AI Serving in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position