Posts by Collection

column

[CUDA] CPU vs GPU with python

Published: February 06, 2024

This column compares the speed of the CPU and GPU.

[DS]Data Preprocessing - UnderSampling and OverSampling

Published: September 11, 2024

This time we will look at the basic concepts of UnderSampling and OverSampling.

[DS]Data Preprocessing - How Does Normalization Affect Random Forest?

Published: September 12, 2024

This time, we will look at how normalization affects the Random Forest technique. To understand this accurately, we need to understand the concepts of Random Forest and Normalization.

[DS]Data Preprocessing - Choosing Between One-Hot Encoding and Label Encoding in Random Forest

Published: September 13, 2024

Which should you choose between One-Hot Encoding and Label Encoding in Random Forest?

[DS] Optimization Models - Hyperparameter Tuning with Bayesian Approach

Published: September 19, 2024

Let’s try to understand Bayesian optimization from the basic perspective of the Bayesian approach.

[Kernel] Why is CrossEntropyLoss Faster with Triton?

Published: April 04, 2025

Why is CrossEntropyLoss Faster with Triton?

paper_review

1. Introduction

Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations

Published in , 2024

Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Published in , 2024

Summarization of GPTQ

FLEXTRON: Many-in-One Flexible LargeLanguage Model

Published in , 2024

Today, I will summarize the paper titled “FLEXTRON: Many-in-One Flexible Large Language Model.” The primary focus of this paper is to propose a novel framework with an elastic structure that can quickly adapt to diverse user environments. To achieve this, paper suggests that like Mixture-of-Experts.

Matryoshka Quantization

Published in , 2025

In the era of massive language models and vision transformers, model efficiency has become just as important as accuracy. Whether you’re deploying on mobile, edge devices, or scaling inference infrastructure, quantization is a crucial technique for compressing models while maintaining performance.

project

publications

MSQ: Memory-Efficient Bit Sparsification Quantization

Seokho Han*, Seo Yeon Yoon*, Jin Hee Kim, Dongwei Wang, Kang Eun Jeon†, Huanrui Yang†, Jong Hwan Ko†

Published in ICCV 2025, 2025

* equal contribution, † corresponding author

Recommended citation: https://www.arxiv.org/pdf/2507.22349

Research Interest

skkuhodomo

Posts by Collection

column

[CUDA] CPU vs GPU with python

[DS]Data Preprocessing - UnderSampling and OverSampling

[DS]Data Preprocessing - How Does Normalization Affect Random Forest?

[DS]Data Preprocessing - Choosing Between One-Hot Encoding and Label Encoding in Random Forest

[DS] Optimization Models - Hyperparameter Tuning with Bayesian Approach

[Kernel] Why is CrossEntropyLoss Faster with Triton?

Why is CrossEntropyLoss Faster with Triton?

paper_review

Attention Is All You Need

1. Introduction

Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations

Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

FLEXTRON: Many-in-One Flexible LargeLanguage Model

Matryoshka Quantization

project

publications

MSQ: Memory-Efficient Bit Sparsification Quantization

research

Quantization

Compute In Memory

Imc_anime

Quatization_anime

Quantization