Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Site Observability Engineer at Jobgether?

The Site Observability Engineer position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

Where is this Site Observability Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Site Observability Engineer role?

Full-time or part-time position

What is the expected salary for this Site Observability Engineer job?

Compensation will be discussed during the hiring process.

Site Observability Engineer job near me in United States, Other / Non-US at Jobgether

Site Observability Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Observability Engineer based in United States.

This role is central to ensuring engineering teams have full visibility into system health, performance, and reliability across complex distributed environments. The engineer will design and operate end-to-end observability platforms covering metrics, logs, traces, and events, enabling fast and accurate detection of issues before they impact users. The environment is highly technical, cloud-native, and deeply aligned with SRE principles, with strong emphasis on automation, scalability, and signal quality. The role involves shaping how telemetry is collected, stored, and transformed into actionable insight across the organization. It also requires close collaboration with platform, SRE, and product engineering teams to embed observability into every layer of the system. The position is ideal for someone passionate about reliability engineering, data-driven operations, and building systems that empower others to debug and improve production services.

Accountabilities

This role is responsible for building, operating, and evolving the organization’s observability ecosystem, ensuring engineers can effectively monitor, troubleshoot, and improve distributed systems at scale.

Design and operate enterprise-grade observability platforms across metrics, logs, traces, and events
Architect and manage tools such as Prometheus, Thanos, Mimir, Grafana, Loki, Tempo, OpenTelemetry, and Datadog
Define and enforce SLOs, SLIs, error budgets, and observability standards across teams
Build alerting frameworks integrated with on-call systems to reduce noise and improve incident response
Develop instrumentation standards including logging formats, metric naming, and trace propagation
Manage large-scale telemetry pipelines with a focus on performance, retention, and cost optimization
Build dashboards and self-service tools to improve observability adoption across engineering teams
Improve incident response readiness through better alerting, monitoring, and post-incident analysis
Partner with SRE and platform teams to embed observability into CI/CD and deployment workflows
Mentor engineers on observability best practices, debugging techniques, and reliability engineering principles

Requirements:

The ideal candidate brings deep experience in observability, SRE practices, and distributed systems, with strong technical and communication skills to drive adoption across engineering teams.

5+ years of experience in SRE, platform engineering, or observability-focused roles
Strong hands-on expertise with Prometheus, Grafana, and at least one commercial tool (Datadog, New Relic, or Splunk)
Solid understanding of OpenTelemetry, distributed tracing, and structured logging
Proficiency in at least one programming language such as Go, Python, or Java
Experience operating high-scale metrics and log pipelines with high cardinality
Strong knowledge of SLOs, SLIs, error budgets, and reliability engineering principles
Experience integrating observability systems with CI/CD and incident management tools
Solid understanding of Linux systems, networking, and containerized environments
Strong troubleshooting, analytical, and communication skills
Experience in building or scaling observability platforms is highly valued

Benefits:

Competitive salary range ($100K–$150K based on experience)
100% remote work within the United States
Full-time W2 employment structure (no C2C or 1099 arrangements)
Health, dental, and vision insurance options
Paid time off and company holidays
Retirement savings plan with employer contributions
Professional development and career growth opportunities
Exposure to modern cloud-native observability stacks and large-scale distributed systems
Collaborative engineering culture focused on reliability and continuous improvement

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Site Observability Engineer in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position