JobTarget Logo

Principal ML Engineer, Machine Learning Platform and Systems Architecture in United States at Jobgether

New
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Principal ML Engineer, Machine Learning Platform and Systems Architecture

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal ML Engineer, Machine Learning Platform and Systems Architecture in United States.

This role is a senior technical leadership position focused on designing and scaling the foundational machine learning systems that power large-scale, production-grade AI applications. You will define and evolve the architecture of ML platforms spanning training, deployment, observability, and data infrastructure, ensuring they are robust, scalable, and efficient. The position sits at the intersection of distributed systems engineering, machine learning infrastructure, and platform strategy, with direct influence on how AI capabilities are delivered into production. You will collaborate closely with researchers, engineers, and product leaders to translate advanced ML concepts into reliable system-level solutions. This is a highly impactful role where you will shape technical direction, solve ambiguous cross-functional challenges, and drive platform excellence across the organization. The environment is remote-friendly, highly collaborative, and focused on building systems that enable cutting-edge innovation at scale.

Accountabilities

In this role, you will be responsible for leading the design, development, and evolution of large-scale ML platform and systems architecture supporting end-to-end machine learning workflows.

  • Lead architecture and delivery of core ML platform capabilities including training, deployment, evaluation, and observability systems
  • Design scalable distributed systems for data processing, feature engineering, model lifecycle management, and production inference
  • Own end-to-end technical outcomes for platform initiatives, from architecture design through deployment and operational support
  • Develop and scale large data pipelines for structured and semi-structured datasets across distributed environments
  • Define and implement frameworks for model deployment, monitoring, observability, and system reliability
  • Establish data governance, lineage, and responsible data usage practices across ML infrastructure
  • Drive architecture for distributed processing systems using tools such as Ray, Spark, Airflow, or equivalent technologies
  • Lead incident response for critical platform issues and implement long-term system improvements
  • Mentor engineers, provide technical leadership, and establish best practices for ML system design and operations
  • Communicate technical strategies, tradeoffs, and architecture decisions to both technical and non-technical stakeholders
Requirements

The ideal candidate brings deep expertise in distributed systems, ML infrastructure, and large-scale platform engineering, along with strong technical leadership skills.

  • 6–8+ years of experience in software engineering, ML infrastructure, platform engineering, or distributed systems
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience
  • Strong expertise in designing and operating large-scale distributed systems and data platforms
  • Advanced proficiency in Python and strong production software engineering practices
  • Experience leading complex, cross-functional technical initiatives across multiple engineering teams
  • Strong background in ML infrastructure including model deployment, inference systems, and observability frameworks
  • Experience with large-scale data pipelines, cloud-native architectures, and distributed processing frameworks
  • Ability to make architectural decisions balancing scalability, performance, reliability, and cost
  • Strong communication and stakeholder management skills across technical and leadership audiences
  • Preferred: experience with Kubernetes, ML orchestration tools, data lineage systems, and ML-ready data representations (graph, geometry, multimodal)
Benefits
  • Competitive base salary ranging from $152,000 to $272,250 depending on experience and location
  • Annual cash bonus eligibility, plus stock grants and additional incentive compensation (role dependent)
  • Comprehensive health, dental, and vision insurance coverage
  • Retirement and financial wellness programs
  • Flexible remote work options across the United States and Canada
  • Paid time off and wellness-focused benefits supporting work-life balance
  • Strong learning and development support for continuous technical growth
  • Inclusive, innovation-driven culture focused on collaboration and belonging
  • Opportunity to build foundational ML systems powering advanced real-world applications
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.