Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a ML Infrastructure Engineer at Jobgether?

The ML Infrastructure Engineer position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

Where is this ML Infrastructure Engineer job located?

Germany, Other / Non-US, Germany

What type of employment is offered for this ML Infrastructure Engineer role?

Full-time or part-time position

What is the expected salary for this ML Infrastructure Engineer job?

Compensation will be discussed during the hiring process.

ML Infrastructure Engineer job near me in Germany, Other / Non-US at Jobgether

ML Infrastructure Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a ML Infrastructure Engineer in Germany.

Join a cutting-edge AI infrastructure environment focused on powering the next generation of machine learning and large-scale AI workloads. This role offers the opportunity to work at the intersection of GPU performance engineering, deep learning optimization, and cloud-scale infrastructure development. You will contribute directly to benchmarking and optimizing advanced GPU platforms that support training and inference for complex neural networks and AI systems. Working alongside highly skilled engineering and hardware teams, you will help drive performance improvements across compute architectures, software stacks, and distributed AI environments. The position is ideal for engineers passionate about ML systems, large-scale model performance, and infrastructure innovation. With exposure to modern AI frameworks, high-performance GPU ecosystems, and international collaboration, this role provides a strong platform for technical growth and meaningful impact within the AI industry.

Accountabilities:

Benchmark and evaluate GPU platform performance for machine learning and AI workloads across various architectures, frameworks, and software environments.
Collaborate closely with hardware and engineering teams to profile GPU performance at system and kernel levels and identify optimization opportunities.
Analyze, debug, and optimize training and inference workloads to improve efficiency, scalability, and overall hardware utilization.
Conduct acceptance testing for new GPU clusters to validate performance, stability, compatibility, and operational readiness for AI workloads.
Perform experiments across multiple GPU configurations and interconnect strategies to assess system-level scalability and performance trade-offs.
Develop internal tools, dashboards, and reporting frameworks to visualize performance metrics, bottlenecks, and infrastructure trends.
Contribute to infrastructure best practices, internal tooling enhancements, and benchmarking methodologies for AI and ML environments.
Support ongoing platform optimization efforts related to distributed training, inference acceleration, parallelism strategies, and hardware-aware performance tuning.

Requirements:

Strong theoretical foundation in machine learning, deep learning architectures, and AI system optimization principles.
Deep understanding of performance optimization techniques for large neural network training and inference, including parallelism strategies, kernel optimization, batching, and hardware acceleration.
Extensive experience with modern deep learning frameworks such as PyTorch, JAX, Megatron-LM, TensorRT-LLM, or equivalent technologies.
Solid expertise with GPU technologies and software stacks including CUDA, NCCL, GPU drivers, and performance-related libraries.
Experience profiling and debugging GPU workloads using tools such as Nsight, nvprof, perf, or similar performance analysis platforms.
Familiarity with containerized and distributed environments including Docker and Kubernetes.
Strong programming and scripting skills, particularly in Python and performance-oriented development workflows.
Excellent problem-solving, analytical thinking, and communication skills with the ability to work independently in highly technical environments.
Experience with LLM inference frameworks such as vLLM, SGLang, or TensorRT is considered a strong advantage.
Familiarity with cloud-based ML ecosystems such as AWS, Google Cloud Platform, or Azure ML is beneficial.
Contributions to open-source ML tooling, benchmarking frameworks, or infrastructure projects are highly valued.

Benefits:

Competitive compensation package aligned with experience and technical expertise.
Flexible remote work environment supporting strong work-life balance.
Access to continuous learning, career development, and growth opportunities within the AI infrastructure space.
Opportunity to work on impactful AI projects shaping the future of machine learning infrastructure and cloud computing.
Collaborative and innovation-driven engineering culture with strong technical ownership and autonomy.
International work environment with exposure to globally distributed teams and advanced AI technologies.
Fast-paced setting focused on bold thinking, experimentation, and continuous technical evolution.
Opportunity to contribute to high-performance AI systems used by developers and enterprises worldwide.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

ML Infrastructure Engineer in Germany at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position