Knowledge Distillation

Summary

Knowledge distillation (KD) is a game changer for me and opens up a lot of research directions. Though many existing studies leverage KD for model compression purposes, my research interest in KD is not only in model compression, but also in supervised compression, and human-annotation-free model training, and more!

ML OSS

Knowledge distillation approaches have been getting complex e.g., use of intermediate feature representations and auximiliary modules (trainable modules that are used in training session only), which makes the implementations more complex. To lower barrier to research on KD, I developed an ML OSS, torchdistill, a modular, configuration-driven framework for knowledge distillation. torchdistill is an installable Python package, and you can install it by "pip3 install torchdistill".

Supervised Compression

We empirically found that leveraging teacher models is key to further improve the tradeoffs between model accuracy and data size, learning compressed representations for supervised tasks. More details are abailable at the project page of Supervised Compression for Split Computing.

Related Publications

A Multi-task Supervised Compression Model for Split Computing

WACV 2025

Yoshitomo Matsubara, Matteo Mendula, Marco Levorato

This work was done prior to joining Spiffy AI.

Paper + Supp Preprint Code

torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP

EMNLP 2023 Workshop for Natural Language Processing Open Source Software (NLP-OSS)

Yoshitomo Matsubara

This work was done prior to joining Amazon.

Paper OpenReview Preprint Code PyPI

Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages

ACL 2023 (Findings)

Shivanshu Gupta, Yoshitomo Matsubara, Ankit Chadha, Alessandro Moschitti

This work was done while Shivanshu Gupta was an applied science intern at Amazon Alexa AI.

Paper Amazon Science Preprint Xtr-WikiQA TyDi-AS2

Ensemble Transformer for Eﬀicient and Accurate Ranking Tasks: an Application to Question Answering Systems

EMNLP 2022 (Findings)

Yoshitomo Matsubara, Luca Soldaini, Eric Lind, Alessandro Moschitti

This work was done while I was an applied science intern at Amazon Alexa AI.

Paper Amazon Science Preprint Code

BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing

IEEE WoWMoM 2022

Yoshitomo Matsubara, Davide Callegaro, Sameer Singh, Marco Levorato, Francesco Restuccia

Paper Preprint Code

Supervised Compression for Resource-Constrained Edge Computing Systems

WACV 2022

Yoshitomo Matsubara, Ruihan Yang, Marco Levorato,

Paper + Supp Preprint Code

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

ICPR 2020 Workshop on Reproducible Research in Pattern Recognition (RRPR)

Yoshitomo Matsubara

Paper Preprint Code PyPI

Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks

ICPR 2020

Yoshitomo Matsubara, Marco Levorato

Paper Supp Preprint Code

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-constrained Edge Computing Systems

IEEE Access

Yoshitomo Matsubara, Davide Callegaro, Sabur Baidya, Marco Levorato, Sameer Singh

Paper Code

Split Computing for Complex Object Detectors: Challenges and Preliminary Results

MobiCom 2020 Workshop on Embedded and Mobile Deep Learning (EMDL)

Yoshitomo Matsubara, Marco Levorato

Paper Preprint Code

Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems

MobiCom 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges (HotEdgeVideo)

Yoshitomo Matsubara, Sabur Baidya, Davide Callegaro, Marco Levorato, Sameer Singh

Paper Code