Staff Machine Learning Engineer, Multimodal Modeling at Jobgether – United States
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Machine Learning Engineer, Multimodal Modeling in the United States.
This role is ideal for a senior machine learning engineer passionate about advancing multimodal AI systems. You will lead the development and fine-tuning of embedding-based retrieval models, unifying text and image representations to improve performance, generalization, and cross-modal alignment. The position requires a strong foundation in representation learning and experience applying state-of-the-art methods to real-world problems. You’ll collaborate closely with engineering and product teams to design scalable, extensible systems, while tackling complex research challenges independently. The environment is fast-paced, high-impact, and innovative, offering the opportunity to shape the future of AI-driven search and recommendation systems. Remote work flexibility is provided, with opportunities to lead and mentor others in the team.
- Lead the design, development, and fine-tuning of multimodal models (e.g., CLIP, SigLIP) for embedding-based retrieval systems.
- Unify and improve cross-modal representations, ensuring high performance and extensibility for evolving product use cases.
- Implement and optimize model architectures, training loops, loss functions, and data pipelines for real-world applications.
- Evaluate and improve vector similarity search, contrastive learning methods, and embedding quality metrics.
- Collaborate with engineering teams to deploy scalable, reliable, and maintainable AI systems.
- Contribute to team growth by mentoring colleagues and providing technical leadership.
- Conduct research and development to explore new modeling approaches and emerging AI techniques.
Requirements:
- 7+ years of industry experience in machine learning, specializing in representation learning, multimodal modeling, or embedding-based retrieval.
- Deep expertise in at least one domain: computer vision, natural language processing, or recommendation systems.
- Proficiency in PyTorch and experience fine-tuning foundation models for real-world tasks.
- Demonstrated ability to customize model architectures, training procedures, and evaluation methods.
- Strong engineering skills in Python, with familiarity in Git, SQL, and Bash.
- Experience with multi-GPU and distributed training workflows is a plus.
- Knowledge of model compression techniques, such as distillation, quantization, or pruning, is desirable.
- Ability to work independently, navigate ambiguity, and solve open-ended modeling challenges.
Benefits:
- Competitive salary range of $200,000–$240,000, with equity options.
- Flexible PTO and 11 company holidays.
- Fully-paid health, dental, and vision benefits with HSA match.
- 12 weeks fully-paid parental leave, with additional physical recovery time for birthing parents.
- Fertility, adoption, and surrogacy support up to $50,000 lifetime maximum.
- Caregiver support programs and 1:1 equity tax advisor sessions.
- Work-from-home and productivity stipends, plus home office setup support.
- Access to employee resource groups and professional development opportunities.