The recent success of deep Convolution Neural Networks (CNN) is like the jewel in the crown of modern AI waves. However, the current CNN models are heavily relying on high-performance computation hardware, such as GPU and TPU, which are normally deployed in a cloud computing environment. Thus, the client applications have to transmit user data to the cloud to gain deep CNN models’ benefits. This constraint strongly limits such models’ applicability on resource-constrained devices, e.g., mobile phones, IoT devices, and embedded devices. Moreover, sending user data to a remote server increases the risk of privacy leakage. Therefore, in recent years, various works aim to solve this problem by reducing memory footprints and accelerating inference. We roughly categorize those works into network pruning, knowledge distillation, low-bit quantization, and compact network designs. The latter has been recognized as the most popular approach that has a massive impact on industrial applications. The compact networks achieved promising accuracy with generally fewer parameters and less computation. More information...
Dr. Haojin Yang studied media technology at the University of Technology Ilmenau, and received the Diplom-Engineer (Dipl.-Ing) degree in 2008. In 2010 he started his Ph.D. study in computer sciences at Hasso-Plattner-Institut and University of Potsdam and received the Doctoral Degree with the final grade “summa cum laude” in 2013. Until October 2019 Dr. Yang was Senior Researcher and group leader for multimedia and machine learning at HPI. In July 2019, he received his habilitation. From November 2019 to October 2020, he worked at the branch head of Edge Computing Lab at AI Labs and Ali-Cloud, Alibaba. After that, he returned to HPI and and currently leads the research group for multimedia and machine learning at the Internet Technologies and Systems chair.