Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Principal MLOps Platform Engineer at Jobgether?

The Principal MLOps Platform Engineer position at Jobgether is a Full-time or part-time position opportunity in the Engineering field.

Where is this Principal MLOps Platform Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Principal MLOps Platform Engineer role?

Full-time or part-time position

What is the expected salary for this Principal MLOps Platform Engineer job?

Compensation will be discussed during the hiring process.

Principal MLOps Platform Engineer job near me in United States, Other / Non-US at Jobgether

Principal MLOps Platform Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal MLOps Platform Engineer in the United States.

This role sits at the center of building and operating a next-generation AI and MLOps platform designed to support production-grade machine learning and agentic systems at scale. You will be responsible for designing the infrastructure backbone that enables model deployment, observability, orchestration, and cost-efficient runtime operations across cloud environments. The position combines deep cloud engineering, platform architecture, and MLOps expertise, with a strong focus on reliability and automation. You will define how models and LLM-powered services are deployed, monitored, and governed in production. Working across engineering, data, and AI teams, you will ensure seamless integration of ML workflows into scalable, secure, and observable systems. This is a high-impact role where your work directly shapes platform performance, developer experience, and operational efficiency. You will also help establish best practices for cost control, environment management, and production readiness of AI systems.

Accountabilities

In this role, you will be responsible for designing, building, and operating the core MLOps platform infrastructure that supports deployment, observability, and lifecycle management of AI and ML systems.

Build and maintain infrastructure as code using Terraform or AWS CDK to support scalable ML platform environments
Design and implement CI/CD pipelines using tools such as GitHub Actions, GitLab CI, or AWS CodePipeline
Establish observability frameworks for ML and LLM systems using CloudWatch, OpenTelemetry, and related tools
Manage containerized workloads using Docker and orchestration platforms such as ECS Fargate or EKS
Define and enforce environment isolation strategies, model versioning, and prompt lifecycle management
Implement monitoring and cost governance mechanisms, including budgets and usage tracking via CloudWatch
Ensure reliability, scalability, and performance of ML runtime infrastructure across production environments
Collaborate with AI, data, and engineering teams to integrate ML workflows into platform architecture
Continuously improve automation, deployment efficiency, and platform developer experience
Support best practices for secure, compliant, and cost-effective ML operations

Requirements

The ideal candidate is a highly skilled cloud and platform engineer with strong MLOps experience, deep AWS expertise, and a strong focus on reliability, observability, and scalable infrastructure design.

7+ years of experience in platform engineering, DevOps, MLOps, or cloud infrastructure roles
Deep expertise in AWS, including production-grade architecture and operational management
Strong experience building infrastructure as code using Terraform or AWS CDK
Hands-on experience with CI/CD pipelines and modern deployment workflows
Proven experience with containerization and orchestration (Docker, ECS, EKS, or Kubernetes)
Strong understanding of observability practices using tools such as CloudWatch and OpenTelemetry
Experience managing ML or LLM workloads in production environments is highly desirable
Strong focus on reliability, scalability, security, and cost optimization
Experience with environment isolation, versioning, and model lifecycle management
Strong analytical and problem-solving skills in complex distributed systems
AWS certifications (Solutions Architect Associate or Professional) are preferred
Kubernetes or CNCF certifications are a plus
Bachelor’s degree in Computer Science, Information Systems, or related field preferred

Benefits

This position offers a competitive compensation package along with strong benefits and the opportunity to work on cutting-edge AI infrastructure.

Salary range: $170,000 – $190,000 annually (OTE, including base and bonus where applicable)
Comprehensive medical, dental, and vision insurance
401(k) retirement savings plan
Paid time off and company holidays
Paid parental and caregiver leave
Remote-friendly work environment (where applicable)
Access to advanced technology environments and internal engineering labs
Continuous learning support, including certifications and training opportunities
Strong culture of inclusion, collaboration, and innovation
Opportunity to build and scale production AI/ML platforms at enterprise level

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Principal MLOps Platform Engineer in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position