JobTarget Logo

Datacenter Hardware Operations Technician Lead, Industrial Compute in United States at Jobgether

NewJob Function: Skilled Labor
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Datacenter Hardware Operations Technician Lead, Industrial Compute

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Datacenter Hardware Operations Technician Lead, Industrial Compute based in United States.

This role sits at the core of large-scale AI infrastructure reliability, where hands-on datacenter expertise directly supports the performance of advanced compute environments powering frontier AI systems. You will act as the senior on-site technical authority for hardware operations, ensuring the stability, availability, and lifecycle health of GPU, server, and storage systems. The position combines deep technical troubleshooting with operational leadership across high-density industrial compute campuses. You will partner closely with engineering, operations, and external vendors to resolve complex hardware issues and drive long-term reliability improvements. The environment is fast-scaling, mission-critical, and deeply collaborative, requiring both precision execution and systems-level thinking. This is a highly impactful role shaping the operational backbone of next-generation AI infrastructure.

Accountabilities:

In this role, you will lead on-site hardware operations and ensure the reliability and performance of large-scale compute infrastructure supporting mission-critical workloads.

  • Serve as the senior on-site technical lead for server, GPU, storage, and rack-level hardware operations
  • Drive diagnosis, triage, and resolution of complex hardware failures impacting production systems
  • Lead root cause analysis (RCA) efforts and implement corrective and preventive actions to improve fleet reliability
  • Partner with engineering, OEM vendors, and operations teams to manage repairs, replacements, and lifecycle activities
  • Develop, refine, and standardize hardware maintenance procedures, troubleshooting runbooks, and operational best practices
  • Analyze hardware failure trends and operational telemetry to identify risks and reliability improvement opportunities
  • Support hardware onboarding, validation, and production readiness for new infrastructure deployments
  • Mentor technicians and partner teams on advanced troubleshooting and hardware reliability practices
Requirements:

This role requires extensive experience in large-scale datacenter environments, with strong technical depth in hardware systems and proven leadership in operational troubleshooting.

  • 8+ years of experience in datacenter hardware operations, sustaining engineering, or senior technician roles
  • Strong expertise in server, GPU, storage, and rack-level infrastructure in large-scale environments
  • Proven ability to diagnose complex hardware failures and lead high-priority production incident resolution
  • Experience conducting root cause analysis and driving long-term reliability improvements
  • Solid understanding of hardware reliability engineering, fleet health, and operational monitoring systems
  • Ability to collaborate across engineering, operations, and vendor ecosystems in high-pressure environments
  • Strong communication skills with experience documenting processes and influencing technical decisions
  • Familiarity with Linux systems, hardware validation workflows, and datacenter tooling is a plus
Benefits:
  • Competitive base compensation with equity and performance-based bonus eligibility
  • Comprehensive medical, dental, and vision coverage with employer contributions
  • 401(k) retirement plan with employer match
  • Generous paid time off, holidays, and company-wide recharge breaks
  • Paid parental leave, medical leave, and caregiver support programs
  • Annual learning and development stipend for professional growth
  • Wellness and mental health support resources
  • Relocation support for eligible employees and additional lifestyle benefits
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.