EDB-IPP Project: LLM Model Compression and Acceleration

Company : Rakuten

Job Type : Internship

Singapore

Job Description - EDB-IPP Project: LLM Model Compression and Acceleration

Job Description:

Rakuten Asia, in partnership with the Economic Development Board (EDB) through the Industrial Postgraduate Programme (IPP), is seeking new PhD students. We are looking for individuals with a robust understanding of deep learning, machine learning, and natural language processing to contribute to our innovative research projects.

Essential requirements include proven hands-on expertise and strong engineering skillsets, specifically in the development and training of PyTorch models.

IPP Programme Benefits
Candidates successfully selected for this programme will receive full sponsorship for their postgraduate studies and will be hired by Rakuten Asia upon successful completion.

Collaboration Model

The collaboration will include joint PhD student supervision, shared access to computational resources for large-scale model compression experiments, and regular research exchanges. Output will include high-impact publications, open-source tools, and demonstrable prototypes of efficient AI.

Project Outline

Introduction

The rapid advancements in large-scale AI models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and Diffusion Models, have unleashed unprecedented capabilities across diverse domains. However, the immense computational and memory demands of these “big models” pose significant challenges for their widespread deployment, real-time inference, and sustainable operation. To truly democratize and scale the power of modern AI, Big Model Compression and Acceleration is not just an optimization; it is a fundamental requirement.

Objectives

The collaboration aims to:

Develop foundational techniques for compressing large AI models, specifically targeting LLMs, MLLMs, and Diffusion Models, to significantly reduce their parameter count and memory footprint without compromising performance.
Advance methods for accelerating the inference of these big models, enabling real-time responsiveness and high-throughput processing across various applications, from natural language understanding to high-fidelity image generation.
Prototype and validate efficient AI systems for real-world applications, demonstrating significant gains in speed, energy efficiency, and deployability for LLMs, MLLMs, and Diffusion Models.
Nurture PhD-level talent through joint supervision and research internships, fostering expertise in the deployment and scaling of efficient AI.

Proposed Research Areas

We propose collaboration across the following topics, with openness to refining based on shared interests:

Advanced Quantization Techniques for LLMs, MLLMs, and Diffusion Models:

Expore novel quantization methods (e.g. beyond 8-bit, mixed-precision, adaptive) to drastically reduce model size and accelerate computation while maintaining high accuracy. This includes investigating learned quantization schemes and robust post-training quantization, specially tailored for the unique architectures and data distributions of LLMs, MLLMs (e.g., multimodal embeddings), and Diffusion Models (e.g., generative quality).

Structured and Unstructured Pruning for Large Generative Models:

Develop sophisticated pruning algorithms to remove redundant parameters from LLMs, MLLMs, and Diffusion Models. Focus will be on achieving high sparsity without significant accuracy or quality loss, through techniques like dynamic, magnitude-based, and Hessian-aware pruning. We will specifically consider their impact on text coherence, image fidelity, cross-modal alignment, and the preservation of emergent capabilities, ensuring high-quality output and avoiding issues like “hallucinations” or mode collapse.

Efficient Knowledge Distillation for Diverse Model Modalities:

Investigate novel knowledge distillation approaches to transfer knowledge from large “teacher” models (LLMs, MLLMs, Diffusion Models) to smaller, more efficient “student” models. This includes exploring various distillation objectives, multi-teacher, and progressive distillation, accounting for the nuances of language, visual, and multimodal data. Research will cover distilling reasoning from LLMs, multimodal knowledge transfer from MLLMs, and accelerating Diffusion Model sampling without quality degradation.

Dynamic Token Pruning and Efficient Sequence Processing:

Explore novel methods for reducing computational costs for long text sequences (tokens) and high-resolution visual data (visual tokens/patches). This involves developing strategies for adaptive text token dropping in LLMs/MLLMs and vision token/patch pruning in MLLMs/Diffusion Models, selectively discarding less informative data. Research also includes advanced techniques like sparse attention mechanisms to reduce quadratic complexity, and token merging/condensation for compact representations. The aim is to significantly reduce FLOPs and memory footprint during inference while maintain performance, quantifying efficiency gains by reducing effective sequence length.

Efficient Generative Sampling and Inference Optimization:

Focus on accelerating the sampling process for generative models (LLMs, MLLMs, Diffusion Models) without compromising output quality. This includes research into faster text decoding strategies (e.g., speculative, tree-based, parallel-decoding) for LLM/MLLM inference. For Diffusion Models, this involves developing advanced sampling techniques (e.g., novel schedules, consistency models, score distillation) to significantly reduce generation steps. We will also optimize inference pipelines for conditional generation tasks, alongside theoretical analysis of generation speed versus quality trade-offs.

Original job EDB-IPP Project: LLM Model Compression and Acceleration posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Share Job

Get your Resume Reviewed for Free

About the Company

Rakuten

楽天市場はインターネット通販が楽しめる総合ショッピングモール。楽天ポイントがどんどん貯まる！使える！毎日お得なクーポンも。食品から家電、ファッション、ベビー用品、コスメまで、充実の品揃え。

Similar LLM Model Compression and Acceleration Jobs in Singapore

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

EDB-IPP Project: LLM Model Compression and Acceleration

Job Description - EDB-IPP Project: LLM Model Compression and Acceleration

About the Company

Similar LLM Model Compression and Acceleration Jobs in Singapore

Mobile Apps