This video belongs to the openHPI course clean-IT: Towards Sustainable Digital Technologies. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
- 00:00Here I want to introduce our research work about Binary Neural networks,
- 00:05and the goal is to make AI training much more energy efficient.
- 00:12And I start with this slide from MIT technology review, which compares
- 00:19the carbon footprint of a round-trip flight,
- 00:23New York to San Francisco. Human life over one year,
- 00:29American life over one year, and US car including fuels,
- 00:35the average lifetime, and
- 00:38the energy that's needed to train the former
- 00:42neural architecture research.
- 00:47Such an AI system needs a huge amount of energy and
- 00:54produces a very big carbon footprint, compared to a flight that also
- 01:02has a considerable carbon footprint.
- 01:06So when we imagine that such AI systems
- 01:11will be applied and used around the world,
- 01:15then it will not work for our climate, so we need definitely
- 01:21much more energy efficient AI systems.
- 01:25And here is an idea we tried to follow over the last years,
- 01:30that we a see Deep Learning architectures
- 01:35and instead of working with 32-bit architectures that
- 01:40we try to use binary neural networks to use networks on the
- 01:481-bit where the computation are done on the bit level.
- 01:53So the state of the art of such AI networks is 32-bit
- 01:59models in the convolution, as there are
- 02:042-bit numbers operated with each other and we
- 02:09try to design and train deep neural networks on a binary level
- 02:16and show that it's possible. And of course it produces a lot of energy saving.
- 02:22So what's the idea behind such low-bit neural networks?
- 02:27Here I show it on the one bit level. The extreme case is binary
- 02:33neural networks only use 2 weights, for the networks, +1 and - 1
- 02:41for weights as well as for inputs,
- 02:45instead of the 32-bit floating point that is
- 02:51done in the state of the art models.
- 02:55Upto 32 model compression and 58 times speed up
- 03:01is possible during the inference and more than a 1000 times energy saving
- 03:08on dedicated hardware is possible. This is shown by
- 03:13works of colleagues
- 03:16in the world. So the challenge of these low bit networks is they are wonderful
- 03:23in using only a small amount of energy compared to the state-
- 03:27of-the-art networks, but they lose accuracy compared to the 32-bit networks.
- 03:34So for example, directly binary sized binarization of a
- 03:39network trained on the image net(this is a huge
- 03:45database of image data) leads to a loss of accuracy by about 10 percent.
- 03:54We believe and with our research work we try to contribute to
- 03:59improve this accuracy and to
- 04:04make the loss in accuracy smaller ,best to reach the same accuracy level
- 04:10like with the 32-bit networks.
- 04:12So the goal of our ongoing research is to achieve the same accuracy
- 04:17with binary networks as with traditional convolutional networks.
- 04:25What would be the result, if this research would
- 04:29succeed? If he would be able
- 04:33to close the gap between the 32-bit convolution networks and the binary
- 04:40neural networks, then we can deploy the dedicated hardware
- 04:46on servers, and achieve huge energy saving. But even more,
- 04:53the networks can run on mobile and embedded devices without
- 04:58a loss of accuracy, and such mobile and embedded devices need a lot less
- 05:04energy compared with this that's needed by
- 05:09server infrastructures.
- 05:12So here are some insights on our research,
- 05:17so what we experiment is with the clipping threshold that should be
- 05:23considered as an hyperparameter, and
- 05:26values between 1.2 and 1.3 lead to better results.
- 05:31Since the value one that was used in most previous work.
- 05:36Here are the as sources where you can follow and see the
- 05:41details of this research work.
- 05:44scaling of channels after binary convolution according to
- 05:49Rastegari et al. can be solved by BatchNorm layers and a tighter
- 05:55approximation of the sign function does not necessarily achieve better results,
- 06:02this could be shown. And these are the models, we are using and experimenting with.
- 06:09There is a binary dense net, a dense net adapter for binary networks
- 06:14replaces bottlenecks and shortcomings. Here are the dense net and
- 06:19a bottleneck. Here is no bottleneck and here are the binary densenet.
- 06:26Here are our suggestion which we are working and which we investigated.
- 06:34If we look to and compare the
- 06:38accuracy of the different models and we here consider our binary dense net
- 06:45model, then you can see here that we
- 06:51receive the best results on the different model size levels,
- 06:59both in the BinaryDenseNet
- 07:02with different parameters, so it makes sense to work around and to modify
- 07:09the networks that are used. Another binary
- 07:13neural network model we are experimenting with are
- 07:18our Melius networks.
- 07:21They are using 1-bit for weights and inputs and lead to lower quality and capacity.
- 07:29But the number of bit and operation are reduced drastically.
- 07:35So the number of possible values for the weights are reduced from
- 07:412^32 to 2.
- 07:44If you have 32-bit representations, the values and the weights
- 07:49could vary in this huge amount of different weights. When you
- 07:56work with the binary, with such MeliusNet then we have only 2
- 08:02weights possible. Ofcourse this leads as well to a quantization error
- 08:08as a lower feature quality.
- 08:12The question is, what can we do? How we can deal with this to improve
- 08:18the quality and to
- 08:22lower the quantization error.
- 08:25So the value range of inputs is
- 08:29similarly reduced, so fine granular difference can no longer
- 08:33exist in only 2-bits for example -1 and +1.
- 08:37And also here, we have a lower feature capacity.
- 08:41And the idea we experiment is to solve both challenges
- 08:48through a specific architecture design.
- 08:53And this is exactly the proposal of our Melius network
- 08:58which is shown here in a overview sketch. We have the dense block,
- 09:06and we have the improvement block, and they are connected here and
- 09:12designed in this way. So this part in this block the feature capacity is increased
- 09:19and here the feature quality is increased in the network.
- 09:26So these a models if we make it more fine-granular, you can here see
- 09:32how it's designed, how we work with
- 09:37different blocks and different transitions fields
- 09:41to make such networks at space only as it's working
- 09:47with the 2 values. And to make it more
- 09:52accurate and to provide more quality.
- 09:57So if we compare our network, our architecture with others
- 10:04then we see MeliusNet we are considering
- 10:11here in the operations, and in the
- 10:15operation models size on a very good way, and it shows
- 10:19good results,then it makes sense to vary, to modify,
- 10:26to think and to design new network models,
- 10:31new binary networks model for this AI application.
- 10:36It's not necessary that the
- 10:4032-bit networks are so much better in quality than such kind
- 10:47of networks. If you are interested in the details, here I could
- 10:50only give an overview. Then here are another
- 10:55network model, the BMXNet 2, we designed an open source framework
- 11:03for binary neural networks. So a BMXNet 2 is based on MXNet
- 11:10which was proposed before. It contains reduceable models and demos
- 11:17to provide a strong basis for research and industry. And it
- 11:20can be used to find new network architectures, test new ideas.
- 11:26Here is a link if you want you can also find there
- 11:32are showcases and demo applications about this BMXNet tool.
- 11:37And there is an entry to demo app
- 11:40also working on the imagenet database and imagenet classification
- 11:48which is based on the RestNet. Human Pose detection demo
- 11:54for a Raspberry Pi, this is a small device which only
- 12:01needs a small amount of energy and also here this
- 12:07is a link to find this. We are interested to interact with you
- 12:13about such ideas. We are interested if you have further improvements,
- 12:19please you are invited to bring this in to our
- 12:24cleanIT forum to share this knowledge with other research groups, with the goal
- 12:31to decrease the carbon footprint of such AI models
- 12:38and applications.
To enable the transcript, please select a language in the video player settings menu.
About this video
While the best AI systems train neural networks based on 32-bit algorithms, the procedure can also be carried out with “binary neural networks” (1-bit algorithm). This drastically reduces the effort in the individual calculation steps and immediately leads to energy savings by a factor of 20. Although binary neural networks are currently about 5 % less accurate than those of AI systems of global players, the reduction can save 95 % electricity usage. With AI applications being used millions of times a day, total emissions can be decreased significantly. More information...
Prof. Dr. Christoph Meinel is CEO and Scientific Director of the Hasso Plattner Institute for Digital Engineering (HPI) as well as Dean of the Digital Engineering Faculty at the University of Potsdam. He holds the chair of Internet Technologies and Systems and teaches courses on IT Systems Engineering on the MOOC platform openHPI and at the HPI School of Design Thinking. He is engaged in the fields of cybersecurity and digital education. He has developed the MOOC platform openHPI.de, supervises numerous Ph.D. students, and is a teacher at the HPI School of Design Thinking, where he is also scientifically active in research.