JobTarget Logo

Observability Engineer in United States at Jobgether

NewJob Function: Engineering
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Observability Engineer

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Observability Engineer based in the United States.

This role is focused on building and scaling the observability backbone that enables engineering teams to operate complex distributed systems with confidence. You will design and run end-to-end telemetry platforms covering metrics, logs, traces, and events, ensuring high signal quality and operational reliability. The position spans both infrastructure and software engineering, combining platform architecture with hands-on implementation of monitoring, alerting, and tracing systems. You will work closely with SREs, platform engineers, and product teams to define meaningful SLOs and transform raw telemetry into actionable insights. The environment is fast-paced and engineering-driven, with a strong emphasis on automation, scalability, and developer experience. This is a high-impact role where your work directly influences system reliability, incident response efficiency, and production visibility across the organization.

Accountabilities:
  • Design, build, and operate enterprise-grade observability platforms across metrics, logs, traces, and events.
  • Architect and maintain scalable monitoring stacks using Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and Datadog.
  • Define and implement SLOs, SLIs, error budgets, and alerting strategies aligned with system reliability goals.
  • Develop high-quality dashboards, alerts, and observability standards to reduce noise and improve signal accuracy.
  • Manage distributed tracing pipelines and enable teams to diagnose latency and performance issues effectively.
  • Operate large-scale time-series and log systems, optimizing for performance, retention, and cost efficiency.
  • Build self-service observability tooling, templates, and libraries to improve adoption across engineering teams.
  • Integrate observability practices into CI/CD pipelines, incident response workflows, and progressive delivery systems.
  • Improve incident response readiness through better alerting hygiene, dashboards, and postmortem tooling.
  • Maintain clear documentation, onboarding guides, and runbooks for observability systems and standards.
  • Mentor engineers on observability best practices, debugging techniques, and SRE principles.
Requirements:
  • Bachelor’s degree in Computer Science or a related technical field.
  • 5+ years of experience in SRE, platform engineering, or observability-focused roles.
  • Strong hands-on experience with Prometheus, Grafana, and at least one commercial observability tool (Datadog, New Relic, or Splunk).
  • Deep understanding of OpenTelemetry, distributed tracing, and structured logging practices.
  • Proficiency in at least one programming language (Go, Python, or Java).
  • Experience operating high-scale metrics and logging pipelines with attention to performance and cost.
  • Strong knowledge of SLOs, error budgets, and reliability engineering principles.
  • Experience integrating observability into CI/CD pipelines and incident management tools.
  • Solid understanding of Linux systems, networking fundamentals, and containerized environments.
  • Strong communication skills and ability to collaborate across engineering and operations teams.
  • Exposure to tools such as Thanos, Mimir, Cortex, Loki, or Tempo is a plus.
  • Experience with observability cost optimization or eBPF-based tooling is a strong advantage.
Benefits:
  • Competitive annual salary ranging from $100,000 to $150,000 based on experience.
  • 100% remote role within the continental United States.
  • Full-time W2 employment with long-term, multi-year engagement stability.
  • Comprehensive benefits package including healthcare and standard employee benefits.
  • Opportunity to work on large-scale distributed systems and modern observability stacks.
  • Exposure to industry-leading tools and cloud-native observability technologies.
  • Strong engineering culture focused on reliability, automation, and continuous improvement.
  • Career growth opportunities in SRE, platform engineering, and cloud observability domains.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.