Senior HPC and AI Networking Performance Research and Analysis Engineer in Switzerland at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior HPC and AI Networking Performance Research and Analysis Engineer in Switzerland.
This role is centered on analyzing and optimizing the performance of large-scale AI and HPC workloads running on cutting-edge distributed computing infrastructures. You will focus on deep technical performance research across GPU and CPU clusters, with a strong emphasis on AI training and inference for large language models. Operating at the intersection of hardware and software, you will investigate communication patterns, RDMA networking, and collective communication frameworks to identify bottlenecks and drive system-level improvements. You will work with advanced technologies spanning GPUs, CPUs, networking hardware, and distributed systems, while leveraging profiling tools, simulators, and performance methodologies. This is a highly analytical and research-driven role, requiring close collaboration with engineering teams across hardware and software domains. Your insights will directly influence performance optimization strategies for next-generation AI supercomputing platforms.
- Profile, analyze, and benchmark large-scale AI workloads across distributed GPU and CPU clusters, with a focus on deep learning and LLM training and inference.
- Investigate communication patterns, networking behavior, and collective operations such as RDMA and NCCL to identify performance bottlenecks.
- Design and implement performance analysis tools, methodologies, and simulation-based approaches to evaluate system behavior.
- Collaborate with hardware and software engineering teams to deliver actionable performance insights and optimization recommendations.
- Define performance test strategies and establish benchmarks for new technologies, systems, and AI infrastructure components.
- Analyze end-to-end system performance across GPUs, CPUs, interconnects, memory systems, and networking infrastructure.
- Contribute to root cause analysis of performance issues in large-scale distributed AI training and inference workloads.
Requirements:
- Bachelor’s degree in Computer Science, Software Engineering, or a related technical field.
- 6+ years of experience in high-performance networking environments, including RDMA, MPI, or NCCL.
- Strong background in performance analysis, benchmarking, and system optimization methodologies.
- Hands-on experience with NVIDIA GPUs, CUDA, and deep learning frameworks such as TensorFlow or PyTorch.
- Strong understanding of networking protocols and collective communication technologies such as RoCE and RDMA.
- Proficiency in Python, C, and Bash, with experience developing performance analysis or diagnostic tools.
- Solid understanding of Linux-based operating systems and distributed computing environments.
- Strong analytical and problem-solving skills with the ability to quickly learn complex systems.
- Excellent communication and teamwork abilities in cross-functional engineering environments.
- Nice to have: deep expertise in LLM benchmarking, congestion control algorithms, CUDA/NCCL internals, and system architecture (CPU, GPU, memory, PCIe, HCA).
Benefits:
- Highly competitive compensation package with performance-based incentives
- Opportunity to work on cutting-edge AI infrastructure and supercomputing systems
- Exposure to large-scale distributed AI workloads used in global-scale environments
- Collaboration with world-class experts in HPC, networking, and AI systems
- Flexible working arrangements across Switzerland and remote locations
- Comprehensive benefits supporting health, wellbeing, and professional growth
- Inclusive and diverse work environment with equal opportunity principles
- Strong emphasis on innovation, learning, and advanced technical development