JobTarget Logo

Architect - Platform Engineer in United States at Jobgether

NewJob Function: Engineering
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Architect - Platform Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for an Architect - Platform Engineer based in the United States.

This is a senior-level architecture role focused on designing and scaling next-generation infrastructure for GenAI and large language model (LLM) workloads in enterprise and production environments. You will define the platform foundations that power distributed training, GPU-accelerated computing, and AI model deployment at scale. The role blends deep systems engineering expertise with modern cloud-native architecture, requiring strong fluency across Kubernetes, high-performance computing, and AI infrastructure stacks. You will collaborate with data scientists, ML engineers, and software architects to deliver robust, scalable GenAI platforms. The environment is highly innovative, fast-paced, and centered on cutting-edge AI transformation across industries. This role is ideal for a hands-on architect who thrives at the intersection of infrastructure, performance engineering, and applied AI systems.

Accountabilities:
  • Design, build, and optimize scalable infrastructure for GenAI and LLM workloads across multi-GPU and distributed computing environments.
  • Architect and manage high-performance compute platforms using Slurm clusters and container orchestration systems such as Kubernetes and OpenShift.
  • Lead GPU performance profiling, benchmarking, and optimization for distributed training and inference workloads.
  • Enable and maintain NVIDIA GPU ecosystem components including CUDA, cuDNN, NCCL, Triton, and related tooling.
  • Develop and operationalize GenAI pipelines supporting fine-tuning, RAG architectures, multi-modal systems, and LLMOps workflows.
  • Build reusable infrastructure-as-code templates using tools such as Terraform and Helm to support scalable deployments.
  • Collaborate with cross-functional engineering teams to deploy AI solutions into both research and production environments.
  • Drive automation, CI/CD practices, and platform reliability through modern DevOps and cloud engineering principles.
  • Lead technical architecture discussions with internal and client-facing stakeholders, providing scalable and production-ready solutions.
Requirements
  • 10+ years of experience in platform engineering, infrastructure architecture, or high-performance computing environments.
  • Strong hands-on expertise with Kubernetes and/or Red Hat OpenShift in production-scale deployments.
  • Deep knowledge of GPU computing ecosystems including CUDA, cuDNN, NCCL, Nsight, and TensorRT/Triton.
  • Proven experience with Slurm-based distributed training systems and multi-GPU optimization.
  • Strong Linux systems expertise with performance tuning and infrastructure scaling experience.
  • Experience building and deploying GenAI workloads such as LLM fine-tuning, RAG pipelines, or multimodal AI systems.
  • Solid understanding of infrastructure-as-code tools including Terraform and Ansible.
  • Experience working with cloud GPU environments (AWS, Azure, GCP, OCI) or on-prem GPU clusters.
  • Strong communication and leadership skills with experience mentoring teams and driving architecture decisions.
  • Ability to work in client-facing environments and translate technical complexity into scalable solutions.
Benefits
  • Competitive compensation aligned with senior-level platform engineering roles
  • Remote-first flexibility across the United States and Canada regions
  • Opportunity to work on cutting-edge GenAI and LLM infrastructure at enterprise scale
  • Exposure to leading cloud and AI ecosystems including major hyperscalers and GPU platforms
  • Career growth within a fast-scaling AI-first engineering organization
  • Hands-on work with advanced technologies such as distributed training, GPU clusters, and LLM systems
  • Collaborative, innovation-driven environment with strong emphasis on learning and technical excellence
  • Opportunity to work on high-impact AI transformation projects across multiple industries.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.