Senior ML Ops Engineer at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior ML Ops Engineer in the United States.
This role offers a unique opportunity to own the full lifecycle of machine learning operations in a high-performing, real-time AI environment. You will be responsible for building, scaling, and optimizing production-grade ML pipelines, ensuring models run reliably and efficiently while meeting stringent performance and latency requirements. Collaborating closely with ML engineers, data teams, and DevOps, you will influence both architecture and operational practices. The position combines hands-on technical work with strategic decision-making, including model deployment, monitoring, inference optimization, and workflow automation. The role is ideal for engineers who thrive in fast-paced environments, enjoy solving complex systems challenges, and want to directly impact AI-driven products at scale.
- Build, maintain, and optimize ML pipelines for production systems, enabling seamless transitions from experimentation to deployment.
- Manage model versioning, rollout, rollback, and lifecycle strategies to ensure reproducibility and robustness.
- Define and enforce service-level agreements (SLAs) for latency, availability, GPU utilization, and other operational metrics.
- Implement observability, monitoring, and alerting frameworks for ML systems to proactively identify and resolve issues.
- Collaborate with ML, Data, Product, and DevOps teams to translate requirements into production-ready systems and influence roadmap decisions.
- Apply software engineering best practices, including testing, CI/CD integration, and workflow reproducibility, without compromising reliability.
- Ensure ML systems are secure, scalable, and cost-efficient, optimizing inference at both hardware and software levels.
Requirements:
- Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field, or equivalent experience.
- 5–8+ years of experience in ML Engineering, Software Engineering, Platform, or Infrastructure Engineering with direct ownership of production ML serving systems.
- Hands-on experience deploying and maintaining LLMs or deep learning models in production.
- Strong Python skills and software engineering fundamentals; experience with ML frameworks (PyTorch, TensorFlow, or similar) preferred.
- Experience with cloud platforms (AWS, GCP, Azure) and ML lifecycle tools, including model registries and experimentation platforms.
- Knowledge of inference optimization, including batching, memory management, quantization tradeoffs, and CPU/GPU interaction.
- Proven ability to reason about tradeoffs between latency, cost, throughput, and reliability at system and operational levels.
- Experience thriving in high-growth startup environments and collaborating cross-functionally in fast-paced settings.
Benefits:
- Competitive base salary range: $200,000 – $250,000 USD, dependent on experience and location.
- Equity opportunities through stock options.
- Comprehensive medical, dental, and vision coverage.
- 401(k) retirement plan.
- Flexible PTO and company holidays.
- Fully remote work within the United States.
- Periodic company offsites and team gatherings.
- Opportunities to influence AI infrastructure at scale and grow with a high-growth technology team.