Staff Engineer -- DevOps / Observability Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Engineer -- DevOps / Observability Engineer in United States.
Join a high-impact engineering team focused on strengthening the reliability, visibility, and performance of critical enterprise applications. In this role, you will lead observability initiatives, optimize monitoring ecosystems, and enhance operational excellence across cloud-based environments. Working closely with development, infrastructure, and support teams, you will drive improvements in telemetry, alerting, incident management, and system reliability. This position offers the opportunity to influence DevOps best practices, automate operational processes, and build scalable monitoring solutions that support business-critical systems. If you thrive in fast-paced environments and enjoy solving complex operational challenges, this role offers both technical depth and strategic impact.
- Analyze, optimize, and maintain application observability frameworks, including telemetry, monitoring, and performance dashboards.
- Improve monitoring visibility by refining metrics, logs, traces, and alerting mechanisms across production environments.
- Review and optimize incident management workflows, escalation policies, and alert configurations to reduce noise and improve response efficiency.
- Identify outdated monitoring assets and implement improvements aligned with current operational and business requirements.
- Support production operations through proactive monitoring, troubleshooting, incident response, and root cause analysis activities.
- Collaborate with engineering, infrastructure, and support teams to enhance system reliability, availability, and operational health.
- Integrate monitoring and observability solutions into CI/CD pipelines and cloud infrastructure environments.
- Recommend and implement industry best practices for observability, automation, reliability engineering, and operational excellence.
Requirements:
- Strong hands-on experience with New Relic, including APM, telemetry collection, dashboard creation, alerting, and observability optimization.
- Proven expertise managing PagerDuty configurations, on-call schedules, escalation policies, alert routing, and incident response workflows.
- Solid experience in DevOps, Site Reliability Engineering (SRE), or production operations environments.
- Strong knowledge of cloud platforms, particularly AWS, with familiarity in infrastructure monitoring and cloud-native operations.
- Experience working with CI/CD pipelines, automation tools, Linux systems, and scripting technologies.
- Strong troubleshooting and problem-solving skills with the ability to investigate complex production issues.
- Understanding of logging, metrics, distributed tracing, and modern observability practices.
- Excellent collaboration and communication skills with the ability to work effectively across cross-functional teams.
- Ability to balance operational priorities while driving continuous improvement initiatives.
Benefits:
- Competitive salary package aligned with experience and expertise.
- Fully remote work opportunity within the United States.
- Exposure to large-scale digital product engineering environments and global projects.
- Opportunity to work with modern cloud, DevOps, and observability technologies.
- Collaborative, agile, and innovation-driven work culture.
- Career growth opportunities through challenging technical projects and continuous learning initiatives.
- Access to international teams and cross-functional collaboration.
- Inclusive and diverse workplace committed to professional development and equal opportunity employment.
- Flexible work environment supporting work-life balance.
- Ongoing training and opportunities to expand technical and leadership capabilities.