Senior Site Reliability Engineer in Canada Creek, Nova Scotia at Jobgether
Explore Related Opportunities
Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Site Reliability Engineer based in Canada.
You will join a global, fully distributed engineering organization building critical infrastructure that powers complex, high-scale platforms used across multiple countries. In this role, you will be responsible for ensuring the reliability, scalability, and security of production systems that support modern cloud-native services and AI-driven workflows. You will design and operate infrastructure that must remain resilient across high-demand, multi-tenant environments. The work is deeply technical and highly impactful, requiring strong ownership of system reliability, observability, and automation. You will collaborate closely with engineering, product, and security teams to build systems that are performant, cost-efficient, and safe by design. This is a role for someone who thrives in asynchronous, remote-first environments and enjoys solving large-scale infrastructure challenges with autonomy and precision.
- Design, implement, and maintain Infrastructure-as-Code using Terraform and Kubernetes to support scalable, production-grade environments.
- Build and improve observability systems, including monitoring, logging, alerting, and dashboards to ensure system visibility and reliability.
- Lead incident response processes, perform root cause analysis, and drive post-incident improvements to reduce system downtime.
- Collaborate with security teams to embed security and compliance requirements into infrastructure design across global jurisdictions.
- Optimize system performance, cloud resource utilization, and infrastructure costs while maintaining reliability standards.
- Identify and eliminate operational toil through automation, improving engineering efficiency and platform scalability.
- Support and enhance CI/CD pipelines and deployment strategies to ensure safe, reliable, and repeatable releases.
- Work closely with platform and product engineering teams to improve API reliability, developer experience, and system observability.
- Produce clear documentation, runbooks, and operational guidelines to support engineering excellence and knowledge sharing.
- Senior-level experience in Site Reliability Engineering, DevOps, or Systems Engineering roles operating production systems at scale.
- Strong hands-on experience with Kubernetes in production environments.
- Solid expertise with AWS cloud services, including compute, networking, storage, and managed services.
- Proficiency with Infrastructure-as-Code tools such as Terraform.
- Experience building and managing CI/CD pipelines using tools like GitHub Actions, GitLab CI, or Jenkins.
- Strong Linux systems knowledge, including debugging, scripting (Bash), and log analysis.
- Experience designing and operating observability stacks (e.g., Prometheus, Grafana, Datadog, ELK).
- Strong communication skills, with the ability to explain complex technical topics to both technical and non-technical audiences.
- Nice to have: experience with backend programming languages (Go, Python, Java, Node.js, etc.).
- Nice to have: experience in multi-tenant platforms, consultancy environments, or large-scale distributed systems.
- Annual salary range: USD $54,000 – $150,000 (based on experience and location).
- Fully remote work from anywhere in the world.
- Flexible working hours with an async-first culture.
- Flexible paid time off policy.
- 16 weeks of paid parental leave.
- Stock options as part of the compensation package.
- Home office and IT equipment budget.
- Learning and professional development budget.
- Mental health and wellness support services.
- Budget for local coworking spaces or in-person team gatherings.
- Inclusive, global work environment with strong emphasis on autonomy and work-life balance.