JobTarget Logo

Model Serving Engineer in United States at Jobgether

NewJob Function: Design
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Model Serving Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Model Serving Engineer in the United States.

This role focuses on building and operating high-performance, production-grade inference systems that power large-scale machine learning applications. You will design and optimize the infrastructure that serves models such as LLMs, vision systems, and recommendation engines, ensuring low latency, high throughput, and efficient GPU utilization. The position involves deep systems engineering work across request routing, autoscaling, caching, and observability to support reliable and scalable AI services. You will collaborate closely with ML engineers and product teams to ensure seamless deployment of new models and capabilities. The environment is highly technical and performance-driven, requiring strong expertise in distributed systems and real-time service optimization. This is a critical role for engineers passionate about making advanced AI models reliable, efficient, and production-ready at scale.

Accountabilities:
  • Design, build, and operate scalable model serving infrastructure for LLMs, vision models, and recommendation systems.
  • Optimize inference performance using techniques such as continuous batching, caching, request multiplexing, and GPU memory optimization.
  • Implement routing, rate limiting, and multi-tenant service policies to ensure reliability and fair resource usage across endpoints.
  • Develop autoscaling, capacity planning, and load balancing systems to maintain performance under varying workloads.
  • Build end-to-end observability systems, including metrics, logging, tracing, and performance monitoring for AI services.
  • Collaborate with ML and product teams to support model deployment, rollout strategies, and production integration.
  • Implement security, abuse detection, and API governance controls across model serving infrastructure.
  • Support incident response, debugging, and continuous reliability improvements for production AI systems.
Requirements:
  • Bachelor’s or Master’s degree in Computer Science or a related technical field.
  • 6+ years of experience in distributed systems, infrastructure engineering, or ML platform engineering.
  • Strong proficiency in Python and a systems programming language such as Go, Rust, or C++.
  • Hands-on experience with large-scale model inference frameworks (e.g., vLLM, TensorRT-LLM, or similar).
  • Strong understanding of GPU architecture, memory management, and performance optimization techniques.
  • Experience with Kubernetes, cloud infrastructure, and autoscaling systems.
  • Expertise in observability tools including metrics, logging, and distributed tracing.
  • Strong background in performance engineering, low-latency systems, and capacity planning.
  • Excellent communication, incident response, and cross-functional collaboration skills.
  • Experience with AI serving optimization techniques such as quantization, caching, or distributed inference is a plus.
Benefits:
  • Competitive W2 compensation aligned with experience and technical expertise.
  • Fully remote, long-term position within the United States.
  • Comprehensive benefits package including medical, dental, and vision coverage.
  • 401(k) retirement savings plan and financial wellness support.
  • Paid time off, holidays, and structured work-life balance.
  • Opportunity to work on cutting-edge AI inference systems and large-scale production platforms.
  • Strong technical growth in distributed systems, GPU computing, and AI infrastructure engineering.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.