JobTarget Logo

Incident and Escalation Manager in United States at Jobgether

NewJob Function: Customer Service
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Incident and Escalation Manager

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for an Incident and Escalation Manager based in the United States.

This is a high-impact operational leadership role at the center of major incident response and customer escalation management within a global, AI-driven infrastructure environment. You will act as the central coordination authority during critical service disruptions, ensuring rapid alignment across engineering, support, product, and executive stakeholders. The role requires strong technical fluency combined with exceptional decision-making under pressure, particularly in mission-critical AI and high-performance computing environments. You will be responsible for restoring service stability while maintaining clear, structured communication with customers and internal leadership teams. Beyond incident resolution, you will also shape and improve global incident and escalation management frameworks. The environment is fast-paced, highly technical, and globally distributed, requiring strong leadership across time zones. This is a strategic role with direct impact on customer trust, operational resilience, and platform reliability.

Accountabilities:
  • Lead and coordinate major incident response efforts for high-severity service disruptions impacting AI, HPC, and enterprise-scale environments.
  • Act as Incident Commander, driving structured triage, cross-functional collaboration, real-time decision-making, and service restoration activities.
  • Manage executive-level escalations, ensuring rapid resolution of critical customer issues and maintaining strong stakeholder alignment.
  • Provide clear, timely, and structured communication to executives, customers, and internal teams during major incidents.
  • Partner with engineering, support, product, and sales teams to resolve complex technical and service-related challenges.
  • Lead post-incident and escalation reviews (PIER), including root cause analysis and corrective action tracking.
  • Identify systemic issues and drive continuous improvement across incident, escalation, and problem management processes.
  • Contribute to the development of operational frameworks, governance models, and service reliability standards across global teams.
Requirements:
  • 12+ years of experience in Incident Management, Escalation Management, Problem Management, or Technical Operations in enterprise or high-tech environments.
  • Proven experience leading high-severity incidents and executive escalations in AI, HPC, or large-scale infrastructure ecosystems.
  • Strong technical understanding of complex distributed systems and ability to collaborate effectively with engineering teams under pressure.
  • Deep knowledge of ITIL frameworks, including Incident, Problem, Change, and Escalation Management practices.
  • Exceptional communication skills, with the ability to manage both technical and executive-level audiences.
  • Strong analytical mindset with experience interpreting incident data, trends, and operational metrics.
  • Ability to operate in high-pressure, customer-facing situations with strong ownership and decision-making capabilities.
  • Experience working in global, 24/7 operational environments with on-call responsibilities.
  • Proven ability to influence cross-functional teams and senior stakeholders without direct authority.
  • Nice to have: experience with AI/HPC environments, distributed storage systems (e.g., Lustre), ITIL certification, and tools such as Jira, Salesforce, Slack, or Confluence.
Benefits:
  • Remote-first role with global operational exposure.
  • Opportunity to work on cutting-edge AI and high-performance computing infrastructure.
  • High-impact position with direct visibility to executive leadership and strategic customers.
  • Competitive compensation aligned with senior-level responsibilities.
  • Strong focus on operational excellence, innovation, and continuous improvement.
  • Collaborative, cross-functional environment with global teams.
  • Exposure to mission-critical systems powering advanced AI workloads worldwide.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.