What is the role of a Senior Software Engineer - AI Inference at Jobgether?

The Senior Software Engineer - AI Inference position at Jobgether is a Full-time or part-time position opportunity in the Information Technology field.

Where is this Senior Software Engineer - AI Inference job located?

United States, Other / Non-US, United States

What type of employment is offered for this Senior Software Engineer - AI Inference role?

Full-time or part-time position

What is the expected salary for this Senior Software Engineer - AI Inference job?

Compensation will be discussed during the hiring process.

How can I apply for the Senior Software Engineer - AI Inference position at Jobgether?

You can apply directly through the application link provided.

Senior Software Engineer - AI Inference at Jobgether | Jobs and Employment

Senior Software Engineer - AI Inference

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Software Engineer - AI Inference in United States.

This role offers an opportunity to work at the forefront of large language model inference, contributing directly to high-performance open-source serving frameworks used at scale. You will help shape how modern AI applications run efficiently on advanced GPU infrastructure by improving the performance, reliability, and scalability of inference systems. Working in a deeply technical and collaborative environment, you will focus on optimizing runtime behavior, reducing latency, and increasing throughput for production-grade AI workloads. The position combines systems engineering, low-level optimization, and open-source contribution, with direct impact on widely used AI frameworks. You will engage with a global engineering community while solving complex performance challenges across distributed GPU systems. This is an ideal role for a hands-on engineer passionate about AI infrastructure and high-performance computing.

Accountabilities:

Contribute features, optimizations, and fixes to open-source inference frameworks such as vLLM and SGLang
Design and improve inference runtime components including scheduling, batching, request handling, and KV-cache optimization
Profile and optimize performance-critical paths across Python, C++, and CUDA layers
Enhance multi-GPU inference performance through improved parallelism, communication strategies, and resource utilization
Develop benchmarking systems and regression tests to ensure performance stability and correctness across deployments
Investigate and resolve bottlenecks using profiling tools, GPU analysis, and data-driven performance evaluation
Collaborate with cross-functional teams to translate production needs into scalable, upstream-ready solutions
Participate in code reviews, architectural discussions, and open-source community contributions

Requirements:

5+ years of experience in production software engineering with strong systems-level expertise
Hands-on experience with LLM inference or serving frameworks such as vLLM, SGLang, or similar systems
Strong programming skills in Python and C++ and/or CUDA with ability to debug and optimize performance-critical code
Experience with performance profiling tools, benchmarking, and latency/throughput optimization techniques
Solid understanding of distributed systems, concurrency, and multi-GPU or multi-node architectures
Strong communication skills and experience working in or contributing to open-source projects
Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or equivalent experience
Strong advantage: contributions to open-source AI, ML, or systems projects such as PyTorch, Triton, NCCL, or similar ecosystems
Strong advantage: experience with GPU memory optimization, kernel fusion, or advanced inference techniques such as quantization or speculative decoding
Strong analytical mindset with a focus on measurement-driven engineering

Benefits:

Competitive base salary ranging from $152,000 to $287,500 depending on level and experience
Equity participation in addition to base compensation
Comprehensive health, dental, and vision insurance coverage
Flexible work arrangements supporting work-life balance
Paid time off, holidays, and parental leave benefits
Professional development opportunities in advanced AI and systems engineering
Exposure to cutting-edge AI infrastructure and large-scale GPU computing systems
Inclusive and innovation-driven engineering culture.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Senior Software Engineer - AI Inference at Jobgether – United States

Explore Related Opportunities

About This Position

Scan to Apply

Job Location

Frequently asked questions about this position