Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Site Reliability Engineer Lead (Observabilidade) at Jobgether?

The Site Reliability Engineer Lead (Observabilidade) position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

What type of employment is offered for this Site Reliability Engineer Lead (Observabilidade) role?

Full-time or part-time position

What is the expected salary for this Site Reliability Engineer Lead (Observabilidade) job?

Compensation will be discussed during the hiring process.

Site Reliability Engineer Lead (Observabilidade) job near me in Brazil, Indiana at Jobgether

Site Reliability Engineer Lead (Observabilidade)

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer Lead (Observabilidade) in Brazil.

This role is focused on building and leading a high-impact SRE function responsible for platform reliability, observability, and incident management at scale. You will combine technical depth with strong people leadership, guiding a team responsible for defining and evolving observability standards, SLOs, and reliability practices. The environment is fast-paced, cloud-native, and highly data-driven, requiring close collaboration with engineering, platform, and security teams. You will play a key role in ensuring full visibility across systems, reducing operational toil, and improving system resilience. This position involves shaping both strategy and execution, establishing best practices that enable proactive monitoring and faster incident resolution. It is a leadership role for someone passionate about reliability engineering, automation, and building high-performing technical teams.

Accountabilities:

Lead, mentor, and develop a high-performing SRE team, fostering collaboration, technical excellence, and continuous learning.
Define the SRE strategy, roadmap, and priorities aligned with cloud and business objectives.
Establish and evolve observability standards, including metrics, logs, and traces across systems and applications.
Drive adoption and governance of SLIs, SLOs, and error budgets for critical services.
Oversee the evolution of observability platforms using tools such as Prometheus, Grafana, OpenTelemetry, Loki, and Tempo.
Design and implement actionable alerting strategies to reduce noise and improve incident response efficiency.
Lead incident management processes, including escalation, war rooms, communication, and post-mortem reviews.
Ensure blameless post-incident analysis and drive systemic improvements based on recurring issues and data insights.
Promote automation initiatives to reduce operational toil and improve engineering efficiency.
Collaborate with Cloud Engineering, Platform Engineering, and Security teams to align reliability initiatives.
Manage team capacity, priorities, and trade-offs while ensuring high-quality delivery.
Report reliability metrics, risks, and team progress to senior leadership.

Requirements:

Proven experience leading technical teams such as SRE, DevOps, or Cloud Engineering.
Strong hands-on experience with SRE principles including SLIs, SLOs, error budgets, and toil reduction.
Experience with observability and APM tools such as Datadog, New Relic, or Dynatrace.
Solid knowledge of telemetry systems (metrics, logs, traces) using Prometheus and OpenTelemetry (Grafana ecosystem).
Experience with Infrastructure as Code tools such as Terraform or AWS CDK.
Strong scripting and programming skills in Python, Bash, and at least one language such as Go or Java.
Experience with logging and tracing solutions at scale such as Loki, Tempo, Jaeger, or ELK Stack.
Strong cloud experience, preferably in AWS environments.
Experience with containers and orchestration technologies such as Docker, Kubernetes, or ECS.
Solid understanding of incident management and post-mortem processes.
Strong Linux systems knowledge and troubleshooting skills.
English proficiency for technical reading and writing.
(Differential) Experience with FinOps, chaos engineering, AIOps, or large-scale distributed systems.

Benefits:

Competitive CLT employment model with stable full-time structure.
Comprehensive health and dental insurance plans.
Life insurance coverage and wellness support programs.
Flexible working hours (8h/day, Monday to Friday).
Home office support, equipment provision, and mobility assistance for remote setup.
Meal and food allowance with flexible usage.
Childcare assistance and extended parental leave policies.
Learning and development support, including courses, books, and education subsidies.
Access to mental, physical, and financial well-being platforms and benefits.
Stock options and performance-based bonuses.
Birthday day off and additional lifestyle perks.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Site Reliability Engineer Lead (Observabilidade) in Brazil, Indiana at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position

Similar Jobs In Brazil, Indiana

Engineering Manager, Core APIs

Manager of Cloud Platform Engineering

Technical Program Manager (TPM)

Desenvolvedor(a) Backend - Sênior

Engineering Manager, Smart Signals