AI Performance Optimization Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Performance Optimization Engineer in the United States.
This role focuses on pushing the limits of performance for large-scale AI systems, with an emphasis on maximizing throughput, minimizing latency, and reducing operational costs across training and inference workloads. You will work across the full stack of AI infrastructure, from GPU-level kernel optimization to distributed system tuning and model serving architecture. The environment is highly technical, data-driven, and collaborative, involving close partnership with ML engineers, platform teams, and product stakeholders. You will help translate complex, ambiguous performance challenges into measurable engineering improvements. The role is ideal for a hands-on expert who thrives in deep systems work and production-grade optimization. You will also contribute to shaping standards, benchmarks, and best practices across AI infrastructure teams.
You will be responsible for improving the performance, efficiency, and scalability of AI training and inference systems across distributed environments.
- Profile and optimize end-to-end AI pipelines to improve throughput, latency, and cost efficiency.
- Identify bottlenecks across compute, memory, networking, and data pipelines, and implement targeted optimizations.
- Develop and tune advanced model optimization techniques such as quantization, sparsity, pruning, and compression.
- Optimize distributed training and inference using parallelism strategies (tensor, pipeline, FSDP, ZeRO).
- Improve LLM serving performance through techniques such as KV caching, batching, and speculative decoding.
- Drive kernel and compiler-level optimizations using tools like Triton, XLA, TorchInductor, or TVM.
- Build benchmarking frameworks, performance monitoring systems, and regression testing suites.
- Collaborate with cross-functional engineering teams to integrate performance best practices into production systems.
- Evaluate hardware and software technologies and guide adoption decisions based on performance trade-offs.
- Document optimization strategies and contribute to internal knowledge sharing and technical leadership.
The ideal candidate has deep expertise in AI systems, performance engineering, and large-scale distributed computing environments.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 6+ years of experience in ML systems, performance engineering, or high-performance computing.
- Strong programming skills in Python and C++, with production-level engineering experience.
- Hands-on experience optimizing deep learning workloads on modern GPU architectures.
- Deep understanding of distributed training, inference systems, and model parallelism techniques.
- Experience with profiling tools across CPU, GPU, and distributed systems.
- Strong knowledge of memory hierarchies, communication overheads, and system bottlenecks.
- Familiarity with model compression and optimization techniques and their trade-offs.
- Strong analytical skills with a disciplined, measurement-driven engineering approach.
- Excellent communication skills and ability to collaborate across technical and non-technical teams.
- Competitive full-time compensation aligned with experience and expertise
- Fully remote work model across the United States
- Long-term, stable engineering engagement on high-impact AI systems
- Opportunity to work on cutting-edge large-scale AI infrastructure challenges
- Collaborative, engineering-driven environment with strong technical ownership
- Exposure to advanced GPU systems, LLM optimization, and distributed AI frameworks
- Career growth opportunities in high-performance AI systems engineering.