IT Incident Management Specialist in United States at Jobgether
Explore Related Opportunities
Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for an IT Incident Management Specialist based in United States.
This role sits at the heart of enterprise IT operations, ensuring the stability, reliability, and performance of large-scale systems supporting critical services. You will be responsible for monitoring, identifying, and coordinating responses to IT incidents and system events across complex infrastructure environments. Working closely with operations, application, and engineering teams, you will help define and execute structured incident response processes that minimize downtime and improve service continuity. The position requires strong analytical thinking, hands-on troubleshooting capability, and deep familiarity with monitoring tools and IT service management practices. You will also play a key role in maintaining operational documentation, ensuring accuracy of escalation paths, dashboards, and system workflows. This is a high-impact operational role where precision and responsiveness directly contribute to service quality and end-user experience.
- Monitor enterprise systems and applications to detect, triage, and coordinate resolution of IT incidents and events.
- Develop, update, and maintain operational documentation, including SOPs, knowledge articles, escalation procedures, and playbooks.
- Support configuration and maintenance of monitoring dashboards and event management tools aligned with operational standards.
- Collaborate with application, infrastructure, and operations teams to ensure accurate incident response workflows and system visibility.
- Analyze performance trends and alerts to identify recurring issues, service degradation, and potential failures.
- Ensure critical business functions, impact assessments, and maintenance schedules are accurately reflected in monitoring systems.
- Validate and improve alerting accuracy to reduce false positives and improve operational efficiency.
- Coordinate cross-functional incident resolution activities and ensure timely communication during outages or disruptions.
- 5+ years of experience in systems administration, IT operations, or enterprise infrastructure support.
- Strong experience with IT monitoring, incident management, and operational performance analysis in complex environments.
- Hands-on experience with APM and monitoring tools such as AppDynamics, Dynatrace, Splunk, Aternity, or SolarWinds.
- At least 2+ years of experience working with Splunk for application monitoring and log analysis.
- Proven ability to create and maintain technical documentation, SOPs, and operational playbooks.
- Experience working with ITSM tools such as ServiceNow and enterprise monitoring platforms.
- Strong proficiency in Microsoft Office tools including Excel, Word, PowerPoint, and SharePoint.
- Ability to troubleshoot system and application issues across distributed environments.
- Bachelor’s degree in Computer Science, Engineering, Mathematics, or equivalent experience.
- Eligibility to obtain and maintain Public Trust or equivalent clearance.
- Competitive compensation package aligned with experience
- Remote work flexibility across the United States
- Opportunity to support mission-critical government-related IT systems
- Exposure to enterprise-scale monitoring and incident management tools
- Professional development in IT operations, monitoring, and reliability engineering practices
- Inclusive and equal opportunity work environment
- Stable long-term engagement in a high-impact operational setting