JobTarget Logo

Software Engineer, Platform Engineering (Observability) in India at Jobgether

NewJob Function: Information Technology
Jobgether
India, India
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Software Engineer, Platform Engineering (Observability)

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Software Engineer, Platform Engineering (Observability) based in India.

This role sits at the heart of a large-scale platform engineering organization responsible for ensuring visibility, reliability, and performance across complex distributed systems. You will design and operate observability solutions that give engineers real-time insight into metrics, logs, traces, and system behavior across hundreds of microservices. The work directly impacts incident detection, mitigation speed, and overall service stability in a fast-moving, cloud-native environment. You will contribute to building AI-assisted observability capabilities that help engineers detect anomalies and resolve incidents faster. The role combines deep systems engineering with platform thinking and strong collaboration across SRE, product, and infrastructure teams. You will also help shape standards and best practices that improve developer experience across the entire engineering organization.

Accountabilities

You will be responsible for designing, building, and maintaining a scalable observability platform that spans metrics, logging, tracing, and alerting across large distributed systems. You will develop tools and pipelines that improve signal quality, reduce alert noise, and enable faster incident detection and resolution.

  • Build and operate observability systems at scale, ensuring high reliability, performance, and efficiency across production environments
  • Design and enhance monitoring, alerting, and incident response workflows to improve MTTD and MTTM
  • Develop AI-assisted capabilities for anomaly detection, alert correlation, and automated incident support
  • Create self-service tooling that enables engineering teams to instrument and monitor their own services
  • Define and enforce observability standards, SLIs/SLOs, and best practices across microservices architectures
  • Collaborate with SRE, platform, security, and product engineering teams to ensure full system visibility
  • Automate operational tasks and reduce engineering toil through platform improvements and tooling
Requirements

This role requires strong experience in building and operating large-scale production systems with a focus on observability, reliability, and cloud-native infrastructure. You should be comfortable working across distributed systems and driving technical improvements in complex environments.

  • 6+ years of experience in software or platform engineering roles focused on scalable production systems
  • Strong hands-on expertise with observability tools such as Datadog, Prometheus, Grafana, or similar
  • Experience working with Kubernetes and containerized production workloads
  • Proficiency in Go or Python for infrastructure, tooling, or backend development
  • Solid experience with cloud platforms such as AWS and/or GCP and Infrastructure as Code (Terraform)
  • Strong understanding of metrics, logs, traces, and distributed system instrumentation
  • Experience designing alerting systems with a focus on reducing noise and improving signal quality
  • Good grasp of SLIs, SLOs, error budgets, and reliability engineering principles
  • Ability to build developer-facing tools that improve productivity and self-service capabilities
  • Strong communication skills with the ability to write technical documentation and drive engineering discussions
  • Experience with AI-driven observability or large-scale microservices environments is a strong plus
Benefits
  • Hybrid work model with flexibility between remote work and office collaboration in Bengaluru
  • Full-time employment with flexible working hours (no fixed core hours)
  • Opportunity to work on large-scale distributed systems and cutting-edge observability platforms
  • Exposure to AI-driven engineering initiatives and next-generation platform tooling
  • Collaborative, high-performance engineering culture with strong focus on learning and growth
  • Strong emphasis on developer experience, automation, and reducing operational toil
  • Work within a global engineering organization building mission-critical systems
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

India, India

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.