Knowledge Distillation
Summary
Knowledge distillation (KD) is a game changer for me and opens up a lot of research directions. Though many existing studies leverage KD for model compression purposes, my research interest in KD is not only in model compression, but also in supervised compression, and human-annotation-free model training, and more!
ML OSS
Knowledge distillation approaches have been getting complex e.g., use of intermediate feature representations and auximiliary modules (trainable modules that are used in training session only), which makes the implementations more complex. To lower barrier to research on KD, I developed an ML OSS, torchdistill, a modular, configuration-driven framework for knowledge distillation. torchdistill is an installable Python package, and you can install it by "pip3 install torchdistill".
Supervised Compression
We empirically found that leveraging teacher models is key to further improve the tradeoffs between model accuracy and data size, learning compressed representations for supervised tasks. More details are abailable at the project page of Supervised Compression for Split Computing.
Related Publications
torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP
EMNLP 2023 Workshop for Natural Language Processing Open Source Software (NLP-OSS)
This work was done prior to joining Amazon.
Paper OpenReview Preprint Code PyPICross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages
ACL 2023 (Findings)
This work was done while Shivanshu Gupta was an applied science intern at Amazon Alexa AI.
Paper Amazon Science Preprint Xtr-WikiQA TyDi-AS2Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
EMNLP 2022 (Findings)
This work was done while I was an applied science intern at Amazon Alexa AI.
Paper Amazon Science Preprint CodeSupervised Compression for Resource-Constrained Edge Computing Systems
WACV 2022
Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks
ICPR 2020
Split Computing for Complex Object Detectors: Challenges and Preliminary Results
MobiCom 2020 Workshop on Embedded and Mobile Deep Learning (EMDL)