Posts by Collection

column

paper_review

FLEXTRON: Many-in-One Flexible LargeLanguage Model

Published in , 2024

Today, I will summarize the paper titled “FLEXTRON: Many-in-One Flexible Large Language Model.” The primary focus of this paper is to propose a novel framework with an elastic structure that can quickly adapt to diverse user environments. To achieve this, paper suggests that like Mixture-of-Experts.

Matryoshka Quantization

Published in , 2025

Thumbnail

In the era of massive language models and vision transformers, model efficiency has become just as important as accuracy. Whether you’re deploying on mobile, edge devices, or scaling inference infrastructure, quantization is a crucial technique for compressing models while maintaining performance.

project

research