JobTarget Logo

Senior Site Reliability Engineer in Canada Creek, Nova Scotia at Jobgether

New
Jobgether
Canada Creek, Nova Scotia, B0P 1V0, Canada
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Senior Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in Canada.

This role sits at the core of a fast-scaling, AI-driven intelligence platform, where reliability is not just operational support but a strategic enabler of product innovation. You will design and own the foundations that ensure large-scale, mission-critical systems remain observable, resilient, and performant under demanding AI and data workloads. Acting as a senior individual contributor, you will shape reliability standards, SLO frameworks, and multi-region architecture while directly influencing engineering decisions across the organization. The environment is highly technical, collaborative, and innovation-focused, with a strong emphasis on AI-native systems and automation-first thinking. You will work across software, AI engineering, and platform teams to ensure seamless delivery of complex services. This is a hands-on leadership role for someone who wants to define how modern AI infrastructure operates at scale.

Accountabilities
  • You will define and own service reliability standards, including SLOs, SLIs, and error budgets, ensuring consistent performance across all production systems.
  • You will design and implement reliability patterns for AI agent pipelines, including observability, failure detection, and safe degradation mechanisms.
  • You will architect and improve multi-region infrastructure strategies, driving high availability, disaster recovery readiness, and blast radius control.
  • You will lead incident response and postmortem processes, ensuring durable fixes and continuous improvement of system resilience.
  • You will serve as the primary reliability partner for engineering and AI teams, influencing architecture, deployment strategies, and system design decisions.
  • You will own observability and platform tooling, including service catalog management, Datadog configuration, and AI workload monitoring.
  • You will develop CI/CD standards and enable self-service developer platforms to improve deployment velocity and system reliability.
  • You will contribute to FinOps initiatives by improving cost visibility and optimizing infrastructure efficiency across cloud environments.
Requirements
  • You bring 6–8+ years of experience in Site Reliability Engineering, DevOps, or platform engineering, with senior-level technical ownership responsibilities.
  • You have deep expertise in AWS and distributed systems architecture, including multi-region, high-availability environments.
  • You are highly skilled in Kubernetes, Docker, Terraform, and GitOps practices, with strong infrastructure-as-code experience.
  • You have hands-on experience with observability platforms such as Datadog, including SLO monitoring, alerting, tracing, and log analytics.
  • You are proficient in scripting and development (Python and/or Bash), with solid understanding of microservices architectures.
  • You have strong experience designing and optimizing CI/CD pipelines (e.g., GitHub Actions, Bitbucket Pipelines).
  • You understand reliability challenges in large-scale systems and can translate complex technical risks into actionable engineering solutions.
  • You have strong communication and collaboration skills, with the ability to influence cross-functional teams and mentor engineers.
  • Experience with AI/ML infrastructure, LLM systems, or agent-based architectures is a strong advantage.
Benefits
  • Competitive compensation in the range of $125,200 – $132,500 CAD.
  • Comprehensive benefits package including health, dental, vision, and wellness coverage.
  • RRSP matching and annual fitness reimbursement.
  • Flexible vacation policy and remote-first work arrangement within Canada.
  • Access to professional training, development programs, and high-growth career opportunities.
  • Wellness resources and employee support programs.
  • Inclusive, diverse, and accessibility-focused work environment.
  • Opportunities to work on cutting-edge AI and large-scale data infrastructure systems.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

Canada Creek, Nova Scotia, B0P 1V0, Canada

Frequently asked questions about this position

Similar Jobs In Canada Creek, Nova Scotia

New

Senior Engineer- Artificial Intelligence

Jobgether
Canada Creek, Nova Scotia
New

Technical Director, ERP Programs

Jobgether
Canada Creek, Nova Scotia
New

Architect AI ML

3Pillar Global
Canada Creek, Nova Scotia
New

Admin Engineer (Celigo / iPaaS)

Jobgether
Canada Creek, Nova Scotia
Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.