Senior ML Engineer (Token Factory) in Romania at Jobgether
Explore Related Opportunities
Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Machine Learning Engineer (Token Factory) based in Romania.
This role sits at the intersection of large-scale AI systems and high-performance infrastructure, focusing on optimizing how foundation models are trained and served at scale.
You will contribute to a cutting-edge inference and fine-tuning platform designed to push modern LLMs to their performance limits across massive GPU fleets.
The work directly impacts throughput, latency, and cost efficiency for next-generation AI workloads used in production environments.
You will collaborate with highly specialized engineers across ML, systems, and infrastructure domains in a fast-moving, research-driven environment.
The role combines deep ML expertise with systems-level engineering, requiring strong understanding of both model architecture and hardware behavior.
You will help design and improve critical components such as inference engines, training pipelines, and GPU optimization strategies.
- Drive inference optimization efforts by identifying bottlenecks and implementing performance improvements across diverse LLM architectures, improving throughput and reducing latency and cost per token.
- Contribute to the design and evolution of inference engines, including techniques such as speculative decoding, KV-cache optimization, and support for dense and MoE models.
- Develop and productionize low-precision training and inference pipelines (e.g., FP8, MXFP4) to maximize efficiency on large GPU clusters.
- Profile and analyze GPU workloads using modern tooling to identify performance constraints and guide architectural improvements.
- Collaborate on scalable distributed training and inference systems, including sharding strategies, custom kernels, and hardware-aware optimizations.
- Contribute to engineering best practices including testing, CI/CD, and maintainable production-grade ML systems.
- Strong understanding of machine learning fundamentals, particularly transformer architectures and large language models.
- Hands-on experience profiling and optimizing GPU workloads using tools such as Nsight or PyTorch Profiler.
- Deep knowledge of GPU architecture, including memory hierarchy and compute vs. memory trade-offs.
- Familiarity with key LLM concepts such as attention mechanisms, RoPE, KV-cache, Flash Attention, and quantization techniques.
- Experience with large-scale deep learning training, including distributed systems, sharding strategies, and custom kernel development.
- Strong software engineering skills, with advanced proficiency in Python and modern ML frameworks.
- Solid understanding of software engineering practices such as version control, CI/CD pipelines, and unit testing.
- Strong communication skills with the ability to collaborate effectively in highly technical, cross-functional teams.
- Competitive compensation package
- Strong career development and continuous learning opportunities
- Flexible work environment with high autonomy and ownership
- Collaborative, innovation-driven engineering culture
- Opportunity to work on frontier AI systems at massive scale
- International, highly skilled, and diverse team environment