Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a AI Performance Optimization Engineer at Jobgether?

The AI Performance Optimization Engineer position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

Where is this AI Performance Optimization Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this AI Performance Optimization Engineer role?

Full-time or part-time position

What is the expected salary for this AI Performance Optimization Engineer job?

Compensation will be discussed during the hiring process.

AI Performance Optimization Engineer job near me in United States, Other / Non-US at Jobgether

AI Performance Optimization Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Performance Optimization Engineer in the United States.

This role focuses on pushing the limits of performance for large-scale AI systems, with an emphasis on maximizing throughput, minimizing latency, and reducing operational costs across training and inference workloads. You will work across the full stack of AI infrastructure, from GPU-level kernel optimization to distributed system tuning and model serving architecture. The environment is highly technical, data-driven, and collaborative, involving close partnership with ML engineers, platform teams, and product stakeholders. You will help translate complex, ambiguous performance challenges into measurable engineering improvements. The role is ideal for a hands-on expert who thrives in deep systems work and production-grade optimization. You will also contribute to shaping standards, benchmarks, and best practices across AI infrastructure teams.

Accountabilities:

You will be responsible for improving the performance, efficiency, and scalability of AI training and inference systems across distributed environments.

Profile and optimize end-to-end AI pipelines to improve throughput, latency, and cost efficiency.
Identify bottlenecks across compute, memory, networking, and data pipelines, and implement targeted optimizations.
Develop and tune advanced model optimization techniques such as quantization, sparsity, pruning, and compression.
Optimize distributed training and inference using parallelism strategies (tensor, pipeline, FSDP, ZeRO).
Improve LLM serving performance through techniques such as KV caching, batching, and speculative decoding.
Drive kernel and compiler-level optimizations using tools like Triton, XLA, TorchInductor, or TVM.
Build benchmarking frameworks, performance monitoring systems, and regression testing suites.
Collaborate with cross-functional engineering teams to integrate performance best practices into production systems.
Evaluate hardware and software technologies and guide adoption decisions based on performance trade-offs.
Document optimization strategies and contribute to internal knowledge sharing and technical leadership.

Requirements

The ideal candidate has deep expertise in AI systems, performance engineering, and large-scale distributed computing environments.

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
6+ years of experience in ML systems, performance engineering, or high-performance computing.
Strong programming skills in Python and C++, with production-level engineering experience.
Hands-on experience optimizing deep learning workloads on modern GPU architectures.
Deep understanding of distributed training, inference systems, and model parallelism techniques.
Experience with profiling tools across CPU, GPU, and distributed systems.
Strong knowledge of memory hierarchies, communication overheads, and system bottlenecks.
Familiarity with model compression and optimization techniques and their trade-offs.
Strong analytical skills with a disciplined, measurement-driven engineering approach.
Excellent communication skills and ability to collaborate across technical and non-technical teams.

Benefits

Competitive full-time compensation aligned with experience and expertise
Fully remote work model across the United States
Long-term, stable engineering engagement on high-impact AI systems
Opportunity to work on cutting-edge large-scale AI infrastructure challenges
Collaborative, engineering-driven environment with strong technical ownership
Exposure to advanced GPU systems, LLM optimization, and distributed AI frameworks
Career growth opportunities in high-performance AI systems engineering.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

AI Performance Optimization Engineer in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position