Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Senior Site Reliability Engineer at Jobgether?

The Senior Site Reliability Engineer position at Jobgether is a Full-time or part-time position opportunity in the relevant field.

Where is this Senior Site Reliability Engineer job located?

Canada Creek, Nova Scotia, B0P 1V0, Canada

What type of employment is offered for this Senior Site Reliability Engineer role?

Full-time or part-time position

What industry does this Senior Site Reliability Engineer position belong to?

This role spans multiple industries.

What is the expected salary for this Senior Site Reliability Engineer job?

Compensation will be discussed during the hiring process.

Senior Site Reliability Engineer job near me in Canada Creek, Nova Scotia at Jobgether

Senior Site Reliability Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in Canada.

This role sits at the core of a fast-scaling, AI-driven intelligence platform, where reliability is not just operational support but a strategic enabler of product innovation. You will design and own the foundations that ensure large-scale, mission-critical systems remain observable, resilient, and performant under demanding AI and data workloads. Acting as a senior individual contributor, you will shape reliability standards, SLO frameworks, and multi-region architecture while directly influencing engineering decisions across the organization. The environment is highly technical, collaborative, and innovation-focused, with a strong emphasis on AI-native systems and automation-first thinking. You will work across software, AI engineering, and platform teams to ensure seamless delivery of complex services. This is a hands-on leadership role for someone who wants to define how modern AI infrastructure operates at scale.

Accountabilities

You will define and own service reliability standards, including SLOs, SLIs, and error budgets, ensuring consistent performance across all production systems.
You will design and implement reliability patterns for AI agent pipelines, including observability, failure detection, and safe degradation mechanisms.
You will architect and improve multi-region infrastructure strategies, driving high availability, disaster recovery readiness, and blast radius control.
You will lead incident response and postmortem processes, ensuring durable fixes and continuous improvement of system resilience.
You will serve as the primary reliability partner for engineering and AI teams, influencing architecture, deployment strategies, and system design decisions.
You will own observability and platform tooling, including service catalog management, Datadog configuration, and AI workload monitoring.
You will develop CI/CD standards and enable self-service developer platforms to improve deployment velocity and system reliability.
You will contribute to FinOps initiatives by improving cost visibility and optimizing infrastructure efficiency across cloud environments.

Requirements

You bring 6–8+ years of experience in Site Reliability Engineering, DevOps, or platform engineering, with senior-level technical ownership responsibilities.
You have deep expertise in AWS and distributed systems architecture, including multi-region, high-availability environments.
You are highly skilled in Kubernetes, Docker, Terraform, and GitOps practices, with strong infrastructure-as-code experience.
You have hands-on experience with observability platforms such as Datadog, including SLO monitoring, alerting, tracing, and log analytics.
You are proficient in scripting and development (Python and/or Bash), with solid understanding of microservices architectures.
You have strong experience designing and optimizing CI/CD pipelines (e.g., GitHub Actions, Bitbucket Pipelines).
You understand reliability challenges in large-scale systems and can translate complex technical risks into actionable engineering solutions.
You have strong communication and collaboration skills, with the ability to influence cross-functional teams and mentor engineers.
Experience with AI/ML infrastructure, LLM systems, or agent-based architectures is a strong advantage.

Benefits

Competitive compensation in the range of $125,200 – $132,500 CAD.
Comprehensive benefits package including health, dental, vision, and wellness coverage.
RRSP matching and annual fitness reimbursement.
Flexible vacation policy and remote-first work arrangement within Canada.
Access to professional training, development programs, and high-growth career opportunities.
Wellness resources and employee support programs.
Inclusive, diverse, and accessibility-focused work environment.
Opportunities to work on cutting-edge AI and large-scale data infrastructure systems.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Senior Site Reliability Engineer in Canada Creek, Nova Scotia at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position

Similar Jobs In Canada Creek, Nova Scotia

DevOps Engineer (Automation Systems) - Freelance AI Trainer

Senior Engineer- Artificial Intelligence

Technical Director, ERP Programs

Architect AI ML

Admin Engineer (Celigo / iPaaS)