Founding ML Engineer in the Flower Frontier Model Team in UK at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Founding ML Engineer in the Flower Frontier Model Team in the United Kingdom.
This is a rare opportunity to join a founding engineering group building next-generation frontier AI models that go beyond conventional centralized training approaches. The team focuses on combining state-of-the-art machine learning techniques with decentralized and federated learning paradigms to unlock new sources of data and scale model training in fundamentally different ways. You will work in a highly technical, research-driven yet engineering-focused environment where ideas move quickly into production. The role spans the full ML lifecycle, from data and training pipelines to evaluation and optimization of large-scale models. You will help build and scale systems capable of training foundation models with advanced capabilities across science, healthcare, finance, and other high-impact domains. This is a hands-on position where deep technical execution, curiosity, and collaboration directly shape the direction of frontier AI development.
In this role, you will contribute to the design, development, and scaling of frontier AI models and the systems required to train them at scale. You will work across research and engineering boundaries to build reliable infrastructure and high-performance training pipelines.
- Design, build, and optimize core components across data pipelines, training systems, evaluation frameworks, and post-training workflows.
- Develop and scale distributed training systems for large foundation models across multi-node GPU clusters.
- Debug and resolve complex infrastructure and performance issues, including GPU, memory, and networking bottlenecks.
- Implement and refine model architectures using modern deep learning frameworks and large-scale training libraries.
- Build robust tooling, monitoring, and observability systems for large-scale ML training environments.
- Collaborate with research and engineering teams to translate ideas into production-ready, scalable implementations.
- Contribute to system reliability, reproducibility, and performance optimization across the ML stack.
This role requires strong software engineering fundamentals combined with hands-on experience in machine learning systems and large-scale distributed training environments. You should be comfortable working at the intersection of research and production engineering.
- Strong programming skills in Python and deep learning frameworks such as PyTorch or JAX.
- Experience building, debugging, and optimizing large-scale training systems.
- Hands-on experience with distributed training, multi-node GPU clusters, and performance tuning.
- Familiarity with frameworks such as DeepSpeed, Megatron, or equivalent large-scale ML tooling.
- Solid understanding of distributed systems, networking, and systems-level performance optimization.
- Ability to implement and extend ML models and research ideas into production-grade code.
- Experience with Linux, Git, Docker, and modern ML development workflows.
- Strong debugging skills for GPU, memory, and training instability issues.
- Excellent collaboration and communication skills in cross-functional technical teams.
- Strong engineering discipline, including testing, modular design, and reproducibility practices.
- Exposure to LLM training stages (pre-training, post-training, evaluation) is highly valued.
- Fully remote work within the United Kingdom
- Opportunity to join a founding ML engineering team shaping frontier AI systems
- High-impact role working on next-generation distributed and decentralized AI training
- Exposure to cutting-edge research and large-scale model development
- Collaborative, research-driven, and fast-paced engineering environment
- Opportunity to influence technical direction and system architecture from the ground up
- Work on open-source and globally impactful AI systems
- Competitive compensation aligned with experience
- Strong culture of learning, experimentation, and technical ownership