Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What type of employment is offered for this AI Research Engineer (Model Compression & Quantization) role?

Full-time or part-time position

What is the expected salary for this AI Research Engineer (Model Compression & Quantization) job?

Compensation will be discussed during the hiring process.

AI Research Engineer (Model Compression & Quantization) job near me in India at Jobgether

AI Research Engineer (Model Compression & Quantization)

This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Research Engineer (Model Compression & Quantization) in India.

This role sits at the forefront of efficient AI systems research, focusing on making large-scale multimodal models practical for real-world deployment. You will work on advancing state-of-the-art techniques in model compression, enabling LLMs and vision-language models to run efficiently on resource-constrained devices such as mobile and edge hardware. The position combines deep research with hands-on engineering, requiring you to design and optimize pipelines that reduce memory usage, latency, and compute cost without sacrificing model performance. You will explore and implement techniques such as quantization, pruning, and knowledge distillation, contributing directly to scalable AI infrastructure. Operating in a highly research-driven and experimental environment, you will collaborate with AI engineers and researchers to push the boundaries of efficient multimodal intelligence. This is a high-impact role for someone passionate about both cutting-edge AI research and real-world deployment constraints.

Accountabilities:

Design and implement model compression techniques such as quantization, pruning, and knowledge distillation to optimize large multimodal AI models (LLMs and VLMs) for efficiency and scalability.
Develop low-bit and mixed-precision quantization strategies to reduce model size and inference latency while preserving accuracy and output quality.
Build and refine knowledge distillation pipelines to transfer capabilities from large teacher models to compact student models for efficient inference.
Analyze performance trade-offs between accuracy, latency, memory usage, and throughput across different compression techniques and propose empirical improvements.
Conduct research on emerging model compression methods and contribute to experimental validation of novel approaches for multimodal architectures.
Document experiments, methodologies, and findings to ensure reproducibility and effective collaboration across research and engineering teams.
Contribute to scientific publications and technical papers for leading AI conferences, advancing the field of efficient model deployment.

Requirements:

PhD or equivalent experience in Computer Science, Machine Learning, NLP, or a related field, with a strong research track record in AI or deep learning.
Strong hands-on experience with PyTorch or equivalent deep learning frameworks for training and optimizing large-scale models.
Proven expertise in model quantization, including both Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).
Practical experience with knowledge distillation techniques for compressing large neural networks into smaller, efficient models.
Solid understanding of model pruning methods and neural network optimization strategies for efficiency improvement.
Deep knowledge of transformer-based architectures (LLMs, VLMs), including training dynamics, backpropagation, fine-tuning, and optimization techniques.
Strong research mindset with the ability to evaluate trade-offs and design experiments in multimodal AI systems.
Familiarity with C++ for low-level optimization and inference acceleration is a plus.

Benefits:

Opportunity to work on cutting-edge AI research focused on efficient multimodal and generative model deployment.
High-impact role contributing directly to scalable AI systems for real-world edge and mobile applications.
Fully remote, global-first working environment with international collaboration.
Strong focus on research freedom, experimentation, and publication in top-tier AI conferences.
Exposure to advanced AI systems including LLMs, VLMs, and multimodal architectures at scale.
Competitive compensation aligned with experience and technical expertise.
Opportunity to shape next-generation AI efficiency standards and deployment techniques.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

AI Research Engineer (Model Compression & Quantization) in India at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position