Convolutional Neural Networks

This video belongs to the openHPI course Applied Edge AI: Deep Learning Outside of the Cloud. Do you want to see more?

Enroll yourself for free

Convolutional Neural Networks

Time effort: approx. 8 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00Hello and welcome.
00:02This video will briefly present the basic working ideas of convolutional neural networks.
00:10Convolutional neural network is one of the amazing technologies that bring us many breakthroughs in the current AI revolution.
00:19For example, beating human champions in strategy game, surpassing human performance in various computer vision tasks. Confidence
00:29are widely used in various applications such as autonomous driving, medical imaging, material science and so on and so forth.
00:39Another proof of the performance of CompNets.
00:43Let's briefly look at the previous champion algorithms of ImageNet challenge.
00:51We can see that starting from AlexNet In 2012, the champions are all CompNets with different architectures. Compared with
01:02the traditional methods,
01:03they have a great improvement in accuracy by more than 20% compared to the previous approach.
01:12CompNet has achieved many seemingly amazing results. However, in fact, its principle is very simple.
01:21So let's briefly recap it.
01:26For convolution computation, there is an input image and await filter.
01:31Normally they have three dimensions namely width heights and depths.
01:37The depth here means the number of channels.
01:41Convolution computation is actually used
01:44the weight filter to slice over the input image especially and compute the dot products. Dot product means the pixel wise
01:55multiplication and then sum them up.
01:58In this example, the input image has the dimension of 32 x 32 x 3.
02:06And the filter size is 5 x 5 x 3.
02:11We can calculate the output dimension according to this formula. And in this case the output size is 28 x 28. By using the
02:22Strike Equal one.
02:26This just generate one output feature map.
02:29If there are 4 weight filters, then output feature map channels increases to 4.
02:39In convolutional neural networks,
02:41we often talk about the concept of receptive field.
02:46What is this receptive field? Actually, it is very simple.
02:51That is the number of related pixels in the input tensor corresponding to each pixel in the output feature map.
02:59Obviously, this actually equals the total number of pixels of the weight filters.
03:07In this example, the size of the receptive field is 5 x 5 x 3 equals 75.
03:18This animation intuitively shows the process of convolution.
03:22We can see that the three input channels correspond to the 3 filter channels.
03:28Each step the convolution view recalculate the pixel wise product and then sum the results and finally it adds a bias parameter
03:40to generate pixel value of the output map. Filters can scan the entire space of the input image.
03:51Next, let's take a look at how to perform padding and stride operation in the convolutional layer. Giving a 7 x 7 input.
04:02So if we use the 3 x 3 filter with stride equals one, then we will get a 5 x 5 output map.
04:13If the stride equals 2 then then we will gather 3 x 3 output feature map.
04:22Now,
04:23the problem comes if the stride equals three, then the input map is too small to get a valid result.
04:32How should we do here?
04:34In this case we just need to introduce the padding.
04:38So this is a few
04:39pixels whose value is zero on the boundary of the input map.
04:44To expand the range of the input feature map.
04:48In this example, when the stride equals equals three, we only need to add padding, one equals one to successfully complete
04:58the convolution operation And we will get a 3 x 3 output feature map.
05:05In addition to zero padding, the commonly used the padding methods also include repeating boundary pixel for the padding
05:13reflection padding and constant value pad.
05:20The responsible ideas.
05:22In the reasonable direction can help us to learn the knowledge more effectively and the reasonable ideas and direction
05:33are so called prior
05:34in the context of machine learning. Prior knowledge is generally important for machine learning models.
05:42one may want to ask this question why CompNets works much better than other deep deep neural network or other machine learning
05:51methods in computer vision problem.
05:53An intuitive explanation is that CompNets have a strong prior, the locality.
06:00It can learn local context very well and then converge to the global context.
06:07Why gradient boosting or random forest methods are better than CompNets in the cargo challenge for table data or other structure
06:16data?
06:17A possible reason might be table data lacks of local correlations.
06:24In this video
06:25we have been talking about how the convolutional neural network works.
06:30We introduced the basic operations of the convolutional neural network.
06:34We learn how to compute its primitives and how to utilize stride and padding.
06:40We offer a brief explanation of the characteristics like weight sharing and the important prior of CompNets.
06:51In the practical session of this week,
06:54so as I already introduced in the first video, we will have a practical task for each week we will learn how to implement
07:03and suggested gradient descent and complete the training loop of a neural network from scratch in the first week.
07:12So Joseph will work with you to complete this task in the next video and this time required for complete practical task is
07:22about 2-3 hours.
07:24So I wish you all have fun and great success.
07:31Thank you for watching.