Logo-of-Rakuten-hiring-for-jobs-in-Singapore-on-GrabJobs

EDB-IPP Project: LLM Model Compression and Acceleration

icon building Company : Rakuten
icon briefcase Job Type : Internship

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - EDB-IPP Project: LLM Model Compression and Acceleration

Job Description:

Rakuten Asia, in partnership with the Economic Development Board (EDB) through the Industrial Postgraduate Programme (IPP), is seeking new PhD students. We are looking for individuals with a robust understanding of deep learning, machine learning, and natural language processing to contribute to our innovative research projects.

Essential requirements include proven hands-on expertise and strong engineering skillsets, specifically in the development and training of PyTorch models.

IPP Programme Benefits
Candidates successfully selected for this programme will receive full sponsorship for their postgraduate studies and will be hired by Rakuten Asia upon successful completion.

Collaboration Model

The collaboration will include joint PhD student supervision, shared access to computational resources for large-scale model compression experiments, and regular research exchanges. Output will include high-impact publications, open-source tools, and demonstrable prototypes of efficient AI.

Project Outline

Introduction

The rapid advancements in large-scale AI models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and Diffusion Models, have unleashed unprecedented capabilities across diverse domains. However, the immense computational and memory demands of these “big models” pose significant challenges for their widespread deployment, real-time inference, and sustainable operation. To truly democratize and scale the power of modern AI, Big Model Compression and Acceleration is not just an optimization; it is a fundamental requirement.

Objectives

The collaboration aims to:

  • Develop foundational techniques for compressing large AI models, specifically targeting LLMs, MLLMs, and Diffusion Models, to significantly reduce their parameter count and memory footprint without compromising performance.
  • Advance methods for accelerating the inference of these big models, enabling real-time responsiveness and high-throughput processing across various applications, from natural language understanding to high-fidelity image generation.
  • Prototype and validate efficient AI systems for real-world applications, demonstrating significant gains in speed, energy efficiency, and deployability for LLMs, MLLMs, and Diffusion Models.
  • Nurture PhD-level talent through joint supervision and research internships, fostering expertise in the deployment and scaling of efficient AI.

Proposed Research Areas

We propose collaboration across the following topics, with openness to refining based on shared interests:

  • Advanced Quantization Techniques for LLMs, MLLMs, and Diffusion Models:

Expore novel quantization methods (e.g. beyond 8-bit, mixed-precision, adaptive) to drastically reduce model size and accelerate computation while maintaining high accuracy. This includes investigating learned quantization schemes and robust post-training quantization, specially tailored for the unique architectures and data distributions of LLMs, MLLMs (e.g., multimodal embeddings), and Diffusion Models (e.g., generative quality).

  • Structured and Unstructured Pruning for Large Generative Models:

Develop sophisticated pruning algorithms to remove redundant parameters from LLMs, MLLMs, and Diffusion Models. Focus will be on achieving high sparsity without significant accuracy or quality loss, through techniques like dynamic, magnitude-based, and Hessian-aware pruning. We will specifically consider their impact on text coherence, image fidelity, cross-modal alignment, and the preservation of emergent capabilities, ensuring high-quality output and avoiding issues like “hallucinations” or mode collapse.

  • Efficient Knowledge Distillation for Diverse Model Modalities:

Investigate novel knowledge distillation approaches to transfer knowledge from large “teacher” models (LLMs, MLLMs, Diffusion Models) to smaller, more efficient “student” models. This includes exploring various distillation objectives, multi-teacher, and progressive distillation, accounting for the nuances of language, visual, and multimodal data. Research will cover distilling reasoning from LLMs, multimodal knowledge transfer from MLLMs, and accelerating Diffusion Model sampling without quality degradation.

  • Dynamic Token Pruning and Efficient Sequence Processing:

Explore novel methods for reducing computational costs for long text sequences (tokens) and high-resolution visual data (visual tokens/patches). This involves developing strategies for adaptive text token dropping in LLMs/MLLMs and vision token/patch pruning in MLLMs/Diffusion Models, selectively discarding less informative data. Research also includes advanced techniques like sparse attention mechanisms to reduce quadratic complexity, and token merging/condensation for compact representations. The aim is to significantly reduce FLOPs and memory footprint during inference while maintain performance, quantifying efficiency gains by reducing effective sequence length.

  • Efficient Generative Sampling and Inference Optimization:

Focus on accelerating the sampling process for generative models (LLMs, MLLMs, Diffusion Models) without compromising output quality. This includes research into faster text decoding strategies (e.g., speculative, tree-based, parallel-decoding) for LLM/MLLM inference. For Diffusion Models, this involves developing advanced sampling techniques (e.g., novel schedules, consistency models, score distillation) to significantly reduce generation steps. We will also optimize inference pipelines for conditional generation tasks, alongside theoretical analysis of generation speed versus quality trade-offs.

Original job EDB-IPP Project: LLM Model Compression and Acceleration posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

About the Company

Rakuten

楽天市場はインターネット通販が楽しめる総合ショッピングモール。楽天ポイントがどんどん貯まる!使える!毎日お得なクーポンも。食品から家電、ファッション、ベビー用品、コスメまで、充実の品揃え。

Read more about the company

Auto-Apply to LLM Model Compression and Acceleration Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar LLM Model Compression and Acceleration Jobs in Singapore

GrabJobs is the no1 job portal in Singapore, connecting you to thousands of jobs fast! Find the best jobs in Singapore, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.