This video belongs to the openHPI course Applied Edge AI: Deep Learning Outside of the Cloud. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00Hello and welcome.
- 00:02This video will briefly present the basic working ideas of convolutional neural networks.
- 00:10Convolutional neural network is one of the amazing technologies that bring us many breakthroughs in the current AI revolution.
- 00:19For example, beating human champions in strategy game, surpassing human performance in various computer vision tasks. Confidence
- 00:29are widely used in various applications such as autonomous driving, medical imaging, material science and so on and so forth.
- 00:39Another proof of the performance of CompNets.
- 00:43Let's briefly look at the previous champion algorithms of ImageNet challenge.
- 00:51We can see that starting from AlexNet In 2012, the champions are all CompNets with different architectures. Compared with
- 01:02the traditional methods,
- 01:03they have a great improvement in accuracy by more than 20% compared to the previous approach.
- 01:12CompNet has achieved many seemingly amazing results. However, in fact, its principle is very simple.
- 01:21So let's briefly recap it.
- 01:26For convolution computation, there is an input image and await filter.
- 01:31Normally they have three dimensions namely width heights and depths.
- 01:37The depth here means the number of channels.
- 01:41Convolution computation is actually used
- 01:44the weight filter to slice over the input image especially and compute the dot products. Dot product means the pixel wise
- 01:55multiplication and then sum them up.
- 01:58In this example, the input image has the dimension of 32 x 32 x 3.
- 02:06And the filter size is 5 x 5 x 3.
- 02:11We can calculate the output dimension according to this formula. And in this case the output size is 28 x 28. By using the
- 02:22Strike Equal one.
- 02:26This just generate one output feature map.
- 02:29If there are 4 weight filters, then output feature map channels increases to 4.
- 02:39In convolutional neural networks,
- 02:41we often talk about the concept of receptive field.
- 02:46What is this receptive field? Actually, it is very simple.
- 02:51That is the number of related pixels in the input tensor corresponding to each pixel in the output feature map.
- 02:59Obviously, this actually equals the total number of pixels of the weight filters.
- 03:07In this example, the size of the receptive field is 5 x 5 x 3 equals 75.
- 03:18This animation intuitively shows the process of convolution.
- 03:22We can see that the three input channels correspond to the 3 filter channels.
- 03:28Each step the convolution view recalculate the pixel wise product and then sum the results and finally it adds a bias parameter
- 03:40to generate pixel value of the output map. Filters can scan the entire space of the input image.
- 03:51Next, let's take a look at how to perform padding and stride operation in the convolutional layer. Giving a 7 x 7 input.
- 04:02So if we use the 3 x 3 filter with stride equals one, then we will get a 5 x 5 output map.
- 04:13If the stride equals 2 then then we will gather 3 x 3 output feature map.
- 04:22Now,
- 04:23the problem comes if the stride equals three, then the input map is too small to get a valid result.
- 04:32How should we do here?
- 04:34In this case we just need to introduce the padding.
- 04:38So this is a few
- 04:39pixels whose value is zero on the boundary of the input map.
- 04:44To expand the range of the input feature map.
- 04:48In this example, when the stride equals equals three, we only need to add padding, one equals one to successfully complete
- 04:58the convolution operation And we will get a 3 x 3 output feature map.
- 05:05In addition to zero padding, the commonly used the padding methods also include repeating boundary pixel for the padding
- 05:13reflection padding and constant value pad.
- 05:20The responsible ideas.
- 05:22In the reasonable direction can help us to learn the knowledge more effectively and the reasonable ideas and direction
- 05:33are so called prior
- 05:34in the context of machine learning. Prior knowledge is generally important for machine learning models.
- 05:42one may want to ask this question why CompNets works much better than other deep deep neural network or other machine learning
- 05:51methods in computer vision problem.
- 05:53An intuitive explanation is that CompNets have a strong prior, the locality.
- 06:00It can learn local context very well and then converge to the global context.
- 06:07Why gradient boosting or random forest methods are better than CompNets in the cargo challenge for table data or other structure
- 06:16data?
- 06:17A possible reason might be table data lacks of local correlations.
- 06:24In this video
- 06:25we have been talking about how the convolutional neural network works.
- 06:30We introduced the basic operations of the convolutional neural network.
- 06:34We learn how to compute its primitives and how to utilize stride and padding.
- 06:40We offer a brief explanation of the characteristics like weight sharing and the important prior of CompNets.
- 06:51In the practical session of this week,
- 06:54so as I already introduced in the first video, we will have a practical task for each week we will learn how to implement
- 07:03and suggested gradient descent and complete the training loop of a neural network from scratch in the first week.
- 07:12So Joseph will work with you to complete this task in the next video and this time required for complete practical task is
- 07:22about 2-3 hours.
- 07:24So I wish you all have fun and great success.
- 07:31Thank you for watching.
To enable the transcript, please select a language in the video player settings menu.