Staff Site Reliability Engineer at Jobgether – Brazil, Indiana
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Site Reliability Engineer in Brazil.
This role sits at the core of a cloud-native platform powering mission-critical financial infrastructure used at global scale. You will be responsible for designing, operating, and continuously improving a highly reliable containerized platform that supports banking, payments, and fintech services. Acting as a senior technical authority within the SRE and Platform Engineering organization, you will drive the evolution of Kubernetes-based infrastructure, automation, and GitOps practices. You will play a key role in ensuring system reliability, scalability, and operational excellence across distributed cloud environments. The position requires strong hands-on expertise in cloud infrastructure, deep SRE knowledge, and the ability to influence architectural decisions across teams. You will collaborate closely with engineering teams to reduce operational complexity, improve resilience, and enable safe, fast software delivery at scale.
- You own the end-to-end lifecycle of core platform components, including cloud infrastructure, Kubernetes clusters, networking layers, service mesh, and supporting data-plane systems.
- You design, build, and evolve highly reliable and scalable containerized platforms using SRE and cloud-native engineering best practices.
- You lead infrastructure bootstrap and orchestration initiatives to ensure repeatable, automated, and deterministic platform provisioning across environments.
- You drive Infrastructure-as-Code and GitOps adoption, ensuring all platform changes are automated, versioned, auditable, and reversible.
- You identify automation gaps and lead initiatives to reduce manual intervention, operational risk, and onboarding time at scale.
- You apply and promote SRE principles such as fault isolation, capacity planning, resilience engineering, and graceful degradation across systems.
- You participate in incident response, on-call rotations, and postmortems, acting as a key escalation point for platform reliability.
- You improve operability by reducing MTTD and MTTR through better observability, standardized processes, and platform simplification.
- You collaborate with cross-functional engineering teams to influence architecture decisions and promote reliability best practices.
- You ensure platform operations meet security, compliance, and internal governance requirements.
Requirements:
- You are based in Brazil and bring strong experience in Site Reliability Engineering or Platform Engineering roles at scale.
- You have deep hands-on experience with public cloud platforms such as AWS, with Azure experience considered a plus.
- You have strong expertise operating Kubernetes in production (e.g., EKS or equivalent), including cluster lifecycle management.
- You are highly experienced with service mesh technologies such as Istio, with familiarity in similar tools like Linkerd or App Mesh.
- You have advanced knowledge of Infrastructure as Code practices, especially using Terraform or similar tools.
- You understand cloud-native microservices architectures and distributed systems design principles.
- You have strong experience with observability tooling, including logs, metrics, traces, and alerting systems.
- You are skilled in incident management, on-call operations, and troubleshooting complex production environments.
- You demonstrate strong communication skills and the ability to act as a technical reference across engineering teams.
- You have experience in automation-first infrastructure design and large-scale system reliability improvements.
- English proficiency at B1 level or higher is required.
Benefits:
- Competitive compensation package aligned with senior-level expertise.
- Fully remote working model with flexibility.
- Opportunity to work on mission-critical global financial infrastructure at massive scale.
- Strong focus on automation, innovation, and modern cloud-native engineering practices.
- Exposure to complex distributed systems and high-impact platform engineering challenges.
- Collaborative, highly skilled international engineering environment.
- Career development opportunities within a fast-growing, technology-driven organization.