Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Observability Specialist at Jobgether?

The Observability Specialist position at Jobgether is a Full-time or part-time position opportunity in the Admin/Clerical/Secretarial field.

What type of employment is offered for this Observability Specialist role?

Full-time or part-time position

What is the expected salary for this Observability Specialist job?

Compensation will be discussed during the hiring process.

Observability Specialist job near me in India at Jobgether

Observability Specialist

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for an Observability Specialist based in India.

This role is focused on building and scaling the observability backbone for complex, distributed enterprise platforms powering mission-critical ERP and AI-driven systems. You will design and implement end-to-end visibility solutions that ensure reliability, performance, and transparency across services, agents, and infrastructure. The position plays a key role in enabling engineering and operations teams to understand system behavior through logs, metrics, and distributed traces. You will work on establishing modern observability standards using OpenTelemetry and industry-leading monitoring tools. This is a highly technical and foundational role where your work directly impacts production stability, incident resolution, and customer trust. You will collaborate closely with engineering, DevOps, and platform teams across global environments. The environment is fast-paced, engineering-led, and focused on building highly reliable enterprise-scale systems.

Accountabilities:

Design and implement scalable observability architecture using OpenTelemetry for distributed systems and AI-driven platforms.
Build and maintain metrics, logging, and tracing infrastructure using tools such as Prometheus, Grafana, Jaeger, Loki, and related stacks.
Define and enforce instrumentation standards across Java, Python, and web-based applications.
Implement distributed tracing and context propagation across microservices, MCP workflows, and ERP system integrations.
Develop dashboards, SLIs/SLOs, and alerting systems to monitor platform health, performance, and reliability.
Create custom metrics and telemetry for AI agent behavior, LLM performance, and system-level insights.
Design alerting strategies, escalation paths, and incident response workflows to reduce noise and improve reliability.
Support root cause analysis and production troubleshooting using observability data and structured diagnostics.

Requirements:

Bachelor’s degree in Computer Science or a related technical field.
5+ years of experience in SRE, observability, or platform engineering roles in distributed systems environments.
Strong hands-on expertise with OpenTelemetry, including metrics, logs, and tracing.
Experience with monitoring and visualization tools such as Prometheus, Grafana, and alerting frameworks.
Strong knowledge of distributed tracing tools such as Jaeger, Zipkin, or equivalent systems.
Experience with log aggregation tools like ELK stack, Loki, or similar solutions.
Proficiency in Python, Java, or Go for instrumentation and automation.
Strong understanding of SLI/SLO frameworks, alerting strategies, and incident management practices.
Familiarity with Kubernetes observability, service mesh telemetry, and cloud-native architectures is a plus.
Exposure to AI/ML observability, LLM monitoring, or enterprise ERP systems is highly valued.
Strong analytical, debugging, and communication skills with experience working in distributed global teams.

Benefits:

Opportunity to build core observability infrastructure for next-generation AI and ERP platforms.
Exposure to large-scale distributed systems and enterprise-grade production environments.
Work with modern observability and cloud-native technologies such as OpenTelemetry and Grafana stack.
Strong career growth in platform engineering, SRE, and AI systems reliability domains.
Remote-friendly environment with collaboration across global engineering teams.
Opportunity to influence reliability standards and observability practices at scale.
Continuous learning in advanced monitoring, tracing, and AI system diagnostics.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Observability Specialist in India at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position