Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What type of employment is offered for this Principal ML Engineer, Machine Learning Platform and Systems Architecture role?

Full-time or part-time position

What is the expected salary for this Principal ML Engineer, Machine Learning Platform and Systems Architecture job?

Compensation will be discussed during the hiring process.

Principal ML Engineer, Machine Learning Platform and Systems Architecture job near me in United States, Other / Non-US at Jobgether

Principal ML Engineer, Machine Learning Platform and Systems Architecture

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal ML Engineer, Machine Learning Platform and Systems Architecture in United States.

This role is a senior technical leadership position focused on designing and scaling the foundational machine learning systems that power large-scale, production-grade AI applications. You will define and evolve the architecture of ML platforms spanning training, deployment, observability, and data infrastructure, ensuring they are robust, scalable, and efficient. The position sits at the intersection of distributed systems engineering, machine learning infrastructure, and platform strategy, with direct influence on how AI capabilities are delivered into production. You will collaborate closely with researchers, engineers, and product leaders to translate advanced ML concepts into reliable system-level solutions. This is a highly impactful role where you will shape technical direction, solve ambiguous cross-functional challenges, and drive platform excellence across the organization. The environment is remote-friendly, highly collaborative, and focused on building systems that enable cutting-edge innovation at scale.

Accountabilities

In this role, you will be responsible for leading the design, development, and evolution of large-scale ML platform and systems architecture supporting end-to-end machine learning workflows.

Lead architecture and delivery of core ML platform capabilities including training, deployment, evaluation, and observability systems
Design scalable distributed systems for data processing, feature engineering, model lifecycle management, and production inference
Own end-to-end technical outcomes for platform initiatives, from architecture design through deployment and operational support
Develop and scale large data pipelines for structured and semi-structured datasets across distributed environments
Define and implement frameworks for model deployment, monitoring, observability, and system reliability
Establish data governance, lineage, and responsible data usage practices across ML infrastructure
Drive architecture for distributed processing systems using tools such as Ray, Spark, Airflow, or equivalent technologies
Lead incident response for critical platform issues and implement long-term system improvements
Mentor engineers, provide technical leadership, and establish best practices for ML system design and operations
Communicate technical strategies, tradeoffs, and architecture decisions to both technical and non-technical stakeholders

Requirements

The ideal candidate brings deep expertise in distributed systems, ML infrastructure, and large-scale platform engineering, along with strong technical leadership skills.

6–8+ years of experience in software engineering, ML infrastructure, platform engineering, or distributed systems
Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience
Strong expertise in designing and operating large-scale distributed systems and data platforms
Advanced proficiency in Python and strong production software engineering practices
Experience leading complex, cross-functional technical initiatives across multiple engineering teams
Strong background in ML infrastructure including model deployment, inference systems, and observability frameworks
Experience with large-scale data pipelines, cloud-native architectures, and distributed processing frameworks
Ability to make architectural decisions balancing scalability, performance, reliability, and cost
Strong communication and stakeholder management skills across technical and leadership audiences
Preferred: experience with Kubernetes, ML orchestration tools, data lineage systems, and ML-ready data representations (graph, geometry, multimodal)

Benefits

Competitive base salary ranging from $152,000 to $272,250 depending on experience and location
Annual cash bonus eligibility, plus stock grants and additional incentive compensation (role dependent)
Comprehensive health, dental, and vision insurance coverage
Retirement and financial wellness programs
Flexible remote work options across the United States and Canada
Paid time off and wellness-focused benefits supporting work-life balance
Strong learning and development support for continuous technical growth
Inclusive, innovation-driven culture focused on collaboration and belonging
Opportunity to build foundational ML systems powering advanced real-world applications

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Principal ML Engineer, Machine Learning Platform and Systems Architecture in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position