What is the role of a Director of Engineering - AI Inferences at Jobgether?

The Director of Engineering - AI Inferences position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

Where is this Director of Engineering - AI Inferences job located?

United States, Other / Non-US, United States

What type of employment is offered for this Director of Engineering - AI Inferences role?

Full-time or part-time position

What is the expected salary for this Director of Engineering - AI Inferences job?

Compensation will be discussed during the hiring process.

How can I apply for the Director of Engineering - AI Inferences position at Jobgether?

You can apply directly through the application link provided.

Director of Engineering - AI Inferences at Jobgether | Jobs and Employment

Director of Engineering - AI Inferences

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Director of Engineering - AI Inferences in the United States.

This role leads a high-impact AI inference engineering team, driving the design, deployment, and optimization of large-scale AI model serving systems. You will bridge research and production, architecting high-throughput, low-latency inference pipelines while mentoring a tight-knit team of developers. The position combines hands-on technical work with strategic leadership, influencing the evolution of cutting-edge AI infrastructure. You will evaluate emerging inference frameworks, optimize GPU utilization, and deliver scalable solutions for demanding AI workloads. This is an ideal opportunity for professionals passionate about pushing the boundaries of AI infrastructure and shaping a high-growth engineering organization.

Accountabilities:

Architect, deploy, and optimize AI inference pipelines for large-scale language models, ensuring high throughput, low latency, and efficient GPU utilization.
Lead and mentor a small team of engineers, providing guidance through code reviews, sprint planning, and career development.
Implement and evaluate advanced KV cache management strategies and speculative decoding approaches to minimize redundant computation.
Integrate and optimize serving engines and frameworks to maximize hardware efficiency and system performance.
Conduct research and benchmarking of new tools and frameworks, maintaining expertise in the evolving AI inference landscape.
Collaborate with cross-functional teams to translate research innovations into production-grade systems.
Drive continuous improvement in infrastructure, scalability, and operational best practices for AI workloads.

Requirements:

8+ years of experience in backend or infrastructure engineering, with a focus on AI inference, GPU optimization, or distributed systems.
Proven hands-on expertise in AI model serving, including LLM deployment and inference pipeline optimization.
Deep experience with inference frameworks such as vLLM, LMCache, LLM-d, and NIXL.
Strong programming skills in Python, C++, or Rust, with practical knowledge of CUDA and GPU memory management.
Experience with Kubernetes for scaling AI workloads and optimizing startup times.
Familiarity with KV cache reuse, speculative decoding, and batching strategies for inference efficiency.
Excellent leadership, communication, and mentoring skills, with experience guiding small engineering teams.
Ability to balance hands-on engineering with strategic architectural decisions in a fast-paced, high-growth environment.

Benefits:

Competitive base salary with total compensation determined based on experience, location, and qualifications.
Equity opportunities in a high-growth, pre-IPO company.
Comprehensive health benefits including medical, dental, vision, and life insurance.
401(k) retirement plan with employer match.
Flexible time off, sick leave, and parental leave programs.
Opportunities to work on cutting-edge AI infrastructure projects and shape a growing engineering team.

Why Apply Through Jobgether?

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Director of Engineering - AI Inferences at Jobgether – United States

Explore Related Opportunities

About This Position

Scan to Apply

Job Location

Frequently asked questions about this position