JobTarget Logo

Senior HPC Cluster Engineer in Germany at Jobgether

NewJob Function: Information Technology
Jobgether
Germany, Germany
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Senior HPC Cluster Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior HPC Cluster Engineer based in Germany.

This role sits at the core of a next-generation AI cloud infrastructure environment, focused on building and optimizing large-scale high-performance computing systems. You will work on complex GPU and InfiniBand cluster architectures that power AI and HPC workloads at scale. The position involves deep system-level engineering, performance tuning, and hands-on troubleshooting across distributed infrastructure. You will contribute directly to improving reliability, efficiency, and scalability of compute platforms used for advanced AI and data-intensive applications. Working in a highly technical engineering culture, you will collaborate with experts across systems, networking, and virtualization. This is a high-impact role where your work directly influences the performance of large-scale cloud and AI workloads.

Accountabilities:

Own the performance optimization and reliability of large-scale GPU clusters and InfiniBand networking environments supporting HPC workloads:

  • Tune and optimize GPU cluster performance and InfiniBand fabric efficiency to ensure high throughput and low-latency computing.
  • Diagnose, troubleshoot, and resolve complex system-level issues across GPU, network, and compute layers.
  • Integrate and validate new hardware components into existing HPC infrastructure, including support for GPUs and related accelerators.
  • Work across virtualization and orchestration layers (KVM/QEMU, Kubernetes) to ensure seamless hardware utilization and deployment.
  • Develop and improve automation for monitoring, fault detection, and proactive remediation in distributed compute environments.
  • Configure, manage, and maintain GPU devices, PCIe systems, and InfiniBand networks to ensure stability and scalability.
Requirements:

We are looking for a highly experienced systems engineer with strong expertise in HPC and low-level infrastructure:

  • 5+ years of experience in system-level software engineering with a focus on performance, scalability, or infrastructure optimization.
  • 3+ years of hands-on experience with Linux systems administration, debugging, and performance tuning.
  • Strong understanding of server and hardware architecture including PCIe, NICs, GPUs, and Linux kernel-level behavior.
  • Proficiency in C, C++, Go, or Python for systems or performance-oriented development.
  • Experience working with distributed or HPC environments and solving complex infrastructure challenges.
  • Strong analytical and problem-solving skills with the ability to work on deep technical issues independently.
  • Familiarity with GPU clusters, InfiniBand networking, and large-scale compute systems is highly desirable.
  • Experience with KVM/QEMU or containerized orchestration environments is a plus.
  • Exposure to distributed computing frameworks or libraries such as MPI or NCCL is advantageous.
Benefits:
  • Competitive compensation package.
  • Career development and continuous learning opportunities in advanced AI and HPC systems.
  • Flexible working arrangements and remote-friendly culture across Europe.
  • Opportunity to work on cutting-edge AI infrastructure and large-scale distributed systems.
  • Collaborative engineering environment with high technical ownership.
  • Exposure to international teams and world-class engineering challenges.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

Germany, Germany

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.