SRE Partner [Vaga Afirmativa para PCD] in Brazil, Indiana at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a SRE Partner [Vaga Afirmativa para PCD] in Brazil.
This role sits at the intersection of Site Reliability Engineering and product teams, acting as a strategic embedded partner within high-impact areas of the organization. You will be responsible for bringing reliability practices directly into product engineering workflows, ensuring systems are scalable, observable, and resilient at massive data scale. Working closely with product, platform, and infrastructure teams, you will help translate reliability goals into actionable engineering outcomes. The environment is highly technical, data-driven, and centered on AI-powered engineering platforms that reduce complexity and accelerate delivery. This is a unique opportunity to influence architecture decisions, operational maturity, and engineering culture at scale. You will act not only as a reliability expert, but also as an enabler of adoption for modern platform capabilities across teams.
In this role, you will be responsible for embedding SRE practices within product areas, driving reliability maturity, and ensuring operational excellence across distributed systems. You will work closely with engineering and product teams to align reliability goals with delivery priorities.
- Act as an embedded SRE Partner within prioritized product areas, understanding architecture, risks, and operational needs
- Define and implement SRE practices including SLOs, SLIs, error budgets, on-call structures, incident management, and post-mortems
- Develop and evolve SRE maturity models across teams, identifying gaps and driving continuous improvement
- Strengthen system reliability by proactively identifying risks such as single points of failure, noisy alerts, and ownership gaps
- Promote adoption of platform capabilities such as observability tools, golden paths, canary deployments, and feature flags
- Collaborate closely with product engineering teams to integrate reliability into the development lifecycle
- Provide data-driven insights on system performance, latency, availability, and operational toil to guide decisions
- Support migrations, legacy decommissioning, and improvements in system scalability and resilience
- Contribute to the evolution of internal engineering platforms through direct feedback and technical input
The ideal candidate has strong experience in cloud-native environments and a deep understanding of reliability engineering practices in distributed systems. You combine technical expertise with strong communication and influence skills.
- Proven experience in Site Reliability Engineering or similar infrastructure-focused roles
- Strong knowledge of cloud environments, preferably Google Cloud Platform (GCP)
- Hands-on experience with Kubernetes and distributed system architecture
- Solid expertise in Infrastructure as Code, especially Terraform
- Strong experience with observability tools such as Prometheus, Grafana, Loki, Thanos, Elasticsearch, and AlertManager
- Experience with incident management, on-call operations, and post-mortem processes
- Proven ability to define and manage SLOs, SLIs, and error budgets
- Strong analytical skills for system performance, logs, and reliability metrics
- Excellent communication and stakeholder management skills, with ability to influence technical and non-technical audiences
- Data-driven mindset with focus on measurable impact and continuous improvement
- Competitive compensation package aligned with market standards
- Remote-first and flexible work environment across Brazil
- Opportunity to work on large-scale, high-traffic systems with advanced engineering challenges
- Strong culture of autonomy, ownership, and technical excellence
- Exposure to cutting-edge AI-driven engineering platforms and internal developer tools
- Career growth opportunities in platform engineering and reliability leadership tracks
- Collaborative environment with highly skilled engineering teams
- Participation in initiatives that shape engineering culture and reliability practices