Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What type of employment is offered for this Senior Machine Learning Engineer (Inference Platform) role?

Full-time or part-time position

What is the expected salary for this Senior Machine Learning Engineer (Inference Platform) job?

Compensation will be discussed during the hiring process.

Senior Machine Learning Engineer (Inference Platform) job near me in United States, Other / Non-US at Jobgether

Senior Machine Learning Engineer (Inference Platform)

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Machine Learning Engineer (Inference Platform) in the United States.

In this role, you will take ownership of the production inference systems that power a high-scale AI-driven conversational shopping experience. You will be responsible for building, operating, and optimizing the end-to-end ML serving infrastructure, ensuring models run reliably under real-world production load. The position sits at the intersection of machine learning, distributed systems, and platform engineering, with a strong focus on performance, scalability, and cost efficiency. You will collaborate closely with ML engineers, data teams, product, and DevOps to bring models from experimentation into production seamlessly. This is a high-impact role where your architectural decisions directly shape user experience and system performance. You will work in a fast-paced, startup-like environment where ownership and technical depth are essential. The role offers significant autonomy in defining the future of the inference platform.

Accountabilities:

You will be responsible for building and scaling the core infrastructure that serves machine learning models in production, ensuring reliability, efficiency, and observability across all inference workflows.

Own and evolve a multi-engine inference platform supporting LLMs, embedding models, and other ML workloads in production environments
Build and maintain production-grade ML serving pipelines, from model packaging and deployment to monitoring and lifecycle management
Define and enforce SLAs for latency, throughput, availability, GPU utilization, and token-level performance metrics such as TTFT and ITL
Design and implement model versioning, rollout, rollback, and reproducibility strategies for safe and scalable deployments
Develop observability, monitoring, alerting, and debugging tools for production inference systems
Optimize inference performance through batching strategies, GPU utilization, quantization, and hardware-aware system design
Ensure secure, scalable, and cost-efficient ML serving infrastructure across cloud environments
Partner cross-functionally with ML, data, product, and DevOps teams to translate research into production-ready systems

Requirements:

The ideal candidate brings deep experience in production ML systems, strong software engineering fundamentals, and hands-on expertise with large-scale inference infrastructure.

5–8+ years of experience in ML engineering, software engineering, or platform/infrastructure roles with ownership of production ML systems
Hands-on experience operating LLM serving frameworks such as vLLM, TGI, TensorRT-LLM, or SGLang in real production environments
Strong Python skills and solid understanding of distributed systems and backend engineering principles
Experience with cloud platforms (AWS, GCP, or Azure) and ML lifecycle tooling, including model registries and deployment systems
Deep understanding of inference optimization concepts such as KV caching, batching strategies, GPU memory behavior, and latency bottlenecks
Experience supporting heterogeneous ML workloads including LLMs, embeddings, and extraction models
Strong ability to balance latency, throughput, reliability, and infrastructure cost trade-offs
Experience working in fast-paced, high-growth environments with evolving technical requirements
Excellent problem-solving, communication, and collaboration skills across technical and non-technical teams

Benefits:

Competitive compensation aligned with experience and impact
Remote-first flexibility within the United States
Opportunity to shape core AI infrastructure powering a large-scale consumer-facing product
High ownership role with influence over architecture and technical direction
Collaborative, cross-functional engineering environment
Exposure to cutting-edge LLM and AI inference technologies
Fast-paced startup culture with strong autonomy and technical depth

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Senior Machine Learning Engineer (Inference Platform) in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position