Director of Engineering - AI Inferences at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Director of Engineering - AI Inferences in the United States.
This role leads a high-impact AI inference engineering team, driving the design, deployment, and optimization of large-scale AI model serving systems. You will bridge research and production, architecting high-throughput, low-latency inference pipelines while mentoring a tight-knit team of developers. The position combines hands-on technical work with strategic leadership, influencing the evolution of cutting-edge AI infrastructure. You will evaluate emerging inference frameworks, optimize GPU utilization, and deliver scalable solutions for demanding AI workloads. This is an ideal opportunity for professionals passionate about pushing the boundaries of AI infrastructure and shaping a high-growth engineering organization.
- Architect, deploy, and optimize AI inference pipelines for large-scale language models, ensuring high throughput, low latency, and efficient GPU utilization.
- Lead and mentor a small team of engineers, providing guidance through code reviews, sprint planning, and career development.
- Implement and evaluate advanced KV cache management strategies and speculative decoding approaches to minimize redundant computation.
- Integrate and optimize serving engines and frameworks to maximize hardware efficiency and system performance.
- Conduct research and benchmarking of new tools and frameworks, maintaining expertise in the evolving AI inference landscape.
- Collaborate with cross-functional teams to translate research innovations into production-grade systems.
- Drive continuous improvement in infrastructure, scalability, and operational best practices for AI workloads.
Requirements:
- 8+ years of experience in backend or infrastructure engineering, with a focus on AI inference, GPU optimization, or distributed systems.
- Proven hands-on expertise in AI model serving, including LLM deployment and inference pipeline optimization.
- Deep experience with inference frameworks such as vLLM, LMCache, LLM-d, and NIXL.
- Strong programming skills in Python, C++, or Rust, with practical knowledge of CUDA and GPU memory management.
- Experience with Kubernetes for scaling AI workloads and optimizing startup times.
- Familiarity with KV cache reuse, speculative decoding, and batching strategies for inference efficiency.
- Excellent leadership, communication, and mentoring skills, with experience guiding small engineering teams.
- Ability to balance hands-on engineering with strategic architectural decisions in a fast-paced, high-growth environment.
Benefits:
- Competitive base salary with total compensation determined based on experience, location, and qualifications.
- Equity opportunities in a high-growth, pre-IPO company.
- Comprehensive health benefits including medical, dental, vision, and life insurance.
- 401(k) retirement plan with employer match.
- Flexible time off, sick leave, and parental leave programs.
- Opportunities to work on cutting-edge AI infrastructure projects and shape a growing engineering team.