MLOPS Tech Lead at Healwell AI Inc – Toronto, Ontario
Healwell AI Inc
Toronto, Ontario, M6K 1X9, Canada
Posted on
Updated on
Salary:$130000 - $140000
Explore Related Opportunities
About This Position
Job description There are over 7000 rare diseases identified, affecting over 300 million patients worldwide and 1 in 12 patients in Canada. Many of these patients remain undiagnosed and unaware, resulting in a poor quality of life and potentially serious consequences. Healwell AI (HWAI) (TSX:AIDX), is a leader in AI-enabled clinical intelligence for rare diseases and specialty conditions. Through our proprietary clinical intelligence platform and deep analytical tools, HWAI allows physicians to quickly understand complex, high-risk patients and place them on the right care pathways leading to better outcomes for patients, their families, and the healthcare system. HWAI is looking for We are seeking an experienced MLOps Tech Lead to architect our next-generation AI infrastructure and lead a talented team of engineers. In this pivotal role, you will bridge the gap between Data Science, Cloud Engineering, and DevOps. You will not only be hands-on with our Azure/Databricks stack but will also set the technical vision, establish engineering standards, and ensure our AI platforms are secure, scalable, and cost-efficient. You will own the roadmap for our MLOps maturity, moving us from manual execution to fully automated, observable, and resilient AI systems. You will have the opportunity to enhance your technical leadership skills while contributing to impactful projects in the healthcare space.
Responsibilities The successful candidate will work in a multifaceted role encompassing Cloud Architect, Cloud Security, and DevOps/MLOps responsibilities Lead, mentor, and grow a team of MLOps and Cloud Engineers; conduct code reviews, facilitate technical design sessions, and foster a culture of engineering excellence.Define the high-level architecture for our end-to-end ML platform on Azure, making critical decisions on "build vs. buy" for tooling and infrastructure.Oversee the Terraform codebase; implement modular, reusable infrastructure patterns and enforce state management policies to prevent drift.Own the reliability (SRE) of machine learning systems. Define SLAs/SLOs for model inference and data pipelines, and lead root cause analysis (RCA) for critical incidents.Manage cloud budgets (FinOps) for compute/Databricks usage and enforce rigorous security postures (IAM, network isolation, private endpoints) ensuring compliance with industry standardsEvolve our CI/CD pipelines from simple automation to advanced deployment strategies (Blue/Green, Canary releases, Shadow deployment) for ML models.Deploy and maintain cloud-based ML models in production, ensuring performance and scalabilityDesign, deploy, and manage scalable, secure, and highly available cloud infrastructure on Azure, utilizing infrastructure as code (IaC) principles.Build monitoring systems for data quality, model performance, and pipeline healthCollaborate with cross-functional teams to define problems and develop solutionsDevelop and maintain documentation for cloud architecture, processes, and systemsDiagnose and resolve issues related to application and model performance, pipeline failures, and infrastructure problems.
Required Qualifications Bachelor’s degree in computer science, Engineering, or related field7+ years of total experience in DevOps, Cloud Engineering, or Software Engineering.3+ years specifically focused on MLOps or Data Engineering at a production scale.2+ years in a technical leadership or mentoring role (Team Lead, Principal Engineer, etc.).Deep proficiency with Azure cloud and cloud-native servicesProficiency in Python and shell scriptingHands-on experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)Advanced mastery of TerraformDeep hands-on experience with Databricks (MLflow, Spark, Unity Catalog) Proven experience with orchestration tools (Dagster preferred)Knowledge of Postgres or equivalent database managementExperience with containerization, infrastructure as code, and DevOps/MLOps practicesStrong problem-solving skills and ability to work independently and collaboratively Preferred Qualifications Certifications like Azure Solutions Architect Expert or DevOps Engineer Expert are desirableRelevant certifications in security domains. What You'll Work With Data Platform: Databricks (Spark, Delta Lake) + Weaviate vector store Orchestration: Dagster for pipeline management and scheduling Cloud: Azure services for compute, storage, and ML services Languages: Python, shell Tools: Docker, Kubernetes, Terraform, Git, CI/CD pipelines Monitoring: Custom dashboards, alerting systems, and model performance tracking Culture & Work Environment Communication: We value open and honest communication. Regular check-ins and team meetings ensure everyone is aligned and informed. Transparency: Our decision-making processes are transparent, encouraging input from all team members. Your ideas and feedback will be valued. Promptness: We maintain a fast-paced work environment and expect team members to be prompt in delivering work and meeting deadlines. Guidance: You will be supported and guided by our VP of Technology, who will provide mentorship and direction throughout your co-op experience. What We Offer Hands-on experience with real-world data challenges in the medical field. Opportunities to expand your technical skill set and work with advanced AI tools. A collaborative team environment that fosters learning and innovation. We look forward to receiving your application and hope to welcome you to the HWAI team! HWAI is an equal opportunity employer that welcomes all applicants including persons with disabilities, visible minorities, women, and aboriginals. HWAI will provide reasonable accommodation to qualified job applicants with a disability, on request, and will notify successful applicants of policies relating to the accommodation of employees with disabilities. We would like to thank all applicants for your interest in HWAI, but please note that only successful candidates will be contacted. You can learn more about HWAI at https://healwell.ai
Responsibilities The successful candidate will work in a multifaceted role encompassing Cloud Architect, Cloud Security, and DevOps/MLOps responsibilities Lead, mentor, and grow a team of MLOps and Cloud Engineers; conduct code reviews, facilitate technical design sessions, and foster a culture of engineering excellence.Define the high-level architecture for our end-to-end ML platform on Azure, making critical decisions on "build vs. buy" for tooling and infrastructure.Oversee the Terraform codebase; implement modular, reusable infrastructure patterns and enforce state management policies to prevent drift.Own the reliability (SRE) of machine learning systems. Define SLAs/SLOs for model inference and data pipelines, and lead root cause analysis (RCA) for critical incidents.Manage cloud budgets (FinOps) for compute/Databricks usage and enforce rigorous security postures (IAM, network isolation, private endpoints) ensuring compliance with industry standardsEvolve our CI/CD pipelines from simple automation to advanced deployment strategies (Blue/Green, Canary releases, Shadow deployment) for ML models.Deploy and maintain cloud-based ML models in production, ensuring performance and scalabilityDesign, deploy, and manage scalable, secure, and highly available cloud infrastructure on Azure, utilizing infrastructure as code (IaC) principles.Build monitoring systems for data quality, model performance, and pipeline healthCollaborate with cross-functional teams to define problems and develop solutionsDevelop and maintain documentation for cloud architecture, processes, and systemsDiagnose and resolve issues related to application and model performance, pipeline failures, and infrastructure problems.
Required Qualifications Bachelor’s degree in computer science, Engineering, or related field7+ years of total experience in DevOps, Cloud Engineering, or Software Engineering.3+ years specifically focused on MLOps or Data Engineering at a production scale.2+ years in a technical leadership or mentoring role (Team Lead, Principal Engineer, etc.).Deep proficiency with Azure cloud and cloud-native servicesProficiency in Python and shell scriptingHands-on experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)Advanced mastery of TerraformDeep hands-on experience with Databricks (MLflow, Spark, Unity Catalog) Proven experience with orchestration tools (Dagster preferred)Knowledge of Postgres or equivalent database managementExperience with containerization, infrastructure as code, and DevOps/MLOps practicesStrong problem-solving skills and ability to work independently and collaboratively Preferred Qualifications Certifications like Azure Solutions Architect Expert or DevOps Engineer Expert are desirableRelevant certifications in security domains. What You'll Work With Data Platform: Databricks (Spark, Delta Lake) + Weaviate vector store Orchestration: Dagster for pipeline management and scheduling Cloud: Azure services for compute, storage, and ML services Languages: Python, shell Tools: Docker, Kubernetes, Terraform, Git, CI/CD pipelines Monitoring: Custom dashboards, alerting systems, and model performance tracking Culture & Work Environment Communication: We value open and honest communication. Regular check-ins and team meetings ensure everyone is aligned and informed. Transparency: Our decision-making processes are transparent, encouraging input from all team members. Your ideas and feedback will be valued. Promptness: We maintain a fast-paced work environment and expect team members to be prompt in delivering work and meeting deadlines. Guidance: You will be supported and guided by our VP of Technology, who will provide mentorship and direction throughout your co-op experience. What We Offer Hands-on experience with real-world data challenges in the medical field. Opportunities to expand your technical skill set and work with advanced AI tools. A collaborative team environment that fosters learning and innovation. We look forward to receiving your application and hope to welcome you to the HWAI team! HWAI is an equal opportunity employer that welcomes all applicants including persons with disabilities, visible minorities, women, and aboriginals. HWAI will provide reasonable accommodation to qualified job applicants with a disability, on request, and will notify successful applicants of policies relating to the accommodation of employees with disabilities. We would like to thank all applicants for your interest in HWAI, but please note that only successful candidates will be contacted. You can learn more about HWAI at https://healwell.ai
Scan to Apply
Just scan this QR code to apply from your phone.
Job Location
Toronto, Ontario, M6K 1X9, Canada
Loading interactive map for Toronto, Ontario, M6K 1X9, Canada
Job Location
This job is located in the Toronto, Ontario, M6K 1X9, Canada region.
Frequently asked questions about this position
Latest Job Openings in Ontario
Bus Shunter | Manoeuvre d’autobus
Coach Canada
Kingston, ON
Motorcoach Driver
Coach Canada
Niagara Falls, ON
School Bus Operators
Coach Canada
Mississauga, ON
Bilingual Communications Specialist
Air Line Pilots Association
Toronto, ON
Motorcoach Driver
Coach Canada
Kingston, ON