What type of employment is offered for this Principal Deep Learning Communication Architect role?

Full-time or part-time position

What industry does this Principal Deep Learning Communication Architect position belong to?

This role spans multiple industries.

What is the expected salary for this Principal Deep Learning Communication Architect job?

Compensation will be discussed during the hiring process.

Principal Deep Learning Communication Architect at Jobgether

Principal Deep Learning Communication Architect

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal Deep Learning Communication Architect in United States.

This is a senior technical leadership role focused on defining the future of large-scale AI communication systems powering next-generation distributed deep learning workloads. You will shape the architecture of high-performance communication libraries that enable training and inference at unprecedented scale across massive GPU clusters. Acting at the intersection of software, hardware, and AI systems, you will influence how models with hundreds of billions to trillions of parameters efficiently communicate across advanced interconnects. The role involves deep collaboration with researchers, systems engineers, and hardware architects to co-design scalable solutions for AI infrastructure. You will also contribute to optimizing collective communication frameworks and enabling efficient execution of emerging AI workloads such as agentic and multimodal models. This position is highly strategic and technical, requiring both architectural vision and hands-on systems expertise. Your work will directly impact the performance and scalability of some of the world’s most advanced AI platforms.

Accountabilities:

Define the long-term architecture and technical roadmap for large-scale communication libraries supporting next-generation distributed AI systems
Lead the design and optimization of communication primitives and collective algorithms for high-performance GPU clusters
Drive application and communication co-design efforts across frameworks such as NCCL, NVSHMEM, UCX, UCC, and MPI-based systems
Collaborate with hardware architects to influence the design of future interconnects and GPU networking technologies
Develop analytical models and simulation tools to evaluate system performance under large-scale AI and HPC workloads
Optimize communication performance across heterogeneous interconnects including NVLink, InfiniBand, and Ethernet-based architectures
Guide the evolution of distributed training and inference systems to support trillion-parameter and agentic AI models
Provide technical leadership across cross-functional teams working on AI infrastructure, runtime systems, and GPU architecture

Requirements:

Ph.D. or M.S. in Computer Science, Electrical Engineering, or a related field with 12+ years of experience in HPC or distributed deep learning systems
Deep expertise in parallel computing strategies including data, tensor, pipeline, context, and expert parallelism, as well as ZeRO optimizations
Strong hands-on experience with communication frameworks such as NCCL, UCX, UCC, NVSHMEM, or MPI
Solid understanding of RDMA, RoCE, and InfiniBand low-level networking protocols and hardware interfaces
Advanced knowledge of high-performance inference systems such as TensorRT-LLM, vLLM, SGLang, or NVIDIA Dynamo
Strong background in GPU architecture, including memory hierarchies (HBM3e/HBM4, L2 cache) and CUDA programming
Experience working with large-scale distributed training frameworks such as Megatron-Core, DeepSpeed, or JAX/XLA is a plus
Proven track record of contributing to or leading open-source projects or publishing research in top-tier systems venues
Strong architectural thinking, communication skills, and ability to influence cross-functional technical direction

Benefits:

Competitive base salary ranging from $272,000 to $431,250 USD depending on experience and location
Eligibility for equity participation in addition to base compensation
Comprehensive health, dental, vision, and wellness benefits
Remote or hybrid flexibility depending on role requirements
Opportunity to work on cutting-edge AI infrastructure at massive global scale
Strong focus on research, innovation, and open technical collaboration
Inclusive, high-performance engineering culture with world-class technical teams
Long-term career growth in advanced AI systems and architecture leadership roles.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Principal Deep Learning Communication Architect at Jobgether – United States

Explore Related Opportunities

About This Position

Scan to Apply

Job Location

Frequently asked questions about this position