This video belongs to the openHPI course Applied Edge AI: Deep Learning Outside of the Cloud. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00Hello and welcome in the last video we talked about AlexNet.
- 00:06Today we will continue to review other ConvNets architectures
- 00:13After AlexNet won the ImageNet challenge in 2012,
- 00:17many works have been improved based on it.
- 00:21For example, The ZFNet change the kernel size of the first convolutional layer to 7 x 7, and increased the number of
- 00:31channels for the 3rd, 4th and 5th convolutional layer.
- 00:35This simple change resulted in an accuracy increase of nearly 5% but at the same time it also increases the computation
- 00:46overhead of the network.
- 00:48Nevertheless, it indicated that AlexNet is far from reaching the upper limit of the accuracy.
- 00:58Time has come to 2014.
- 01:02This year the ImageNet challenge ushered in two winners.
- 01:07I want to first introduce VGG-Net proposed by the Visual Geometry Group of Oxford University and this is the network with 19
- 01:16layers.
- 01:20VGG-Net is the runner up in the image classification task and the winner of localization task in the ImageNet challenge
- 01:29that year.
- 01:30Similar to AlexNet,
- 01:31VGG-Net is also gradually shrinking the resolution.
- 01:36It means a spatial dimension of the intermediate feature maps for achieving larger receptive field and while increasing the
- 01:47number of the weight channels.
- 01:56VGG-Net strictly used 3 x 3 filters with stride = 1 and pad = 1.
- 01:57A more fine grained feature learning process is gradually expanded from a small receptive field to the global context to
- 02:06the larger one, so increasing the channel will keep the similar information capacity of the network at each stage.
- 02:17A notable feature is that the VGG-Net uses three consecutive 3 x 3 convolutions instead of a 7 x 7 convolutional
- 02:27kernel. So the size of the receptive field achieved by these two methods is the same.
- 02:34However, due to the more convolutional layers have been used, it increases the non linearity.
- 02:42In addition, the small convolutional Kernel effectively reduces the number of parameters.
- 02:48So this kind of design can reduce 45% of the parameters.
- 02:58VGG-Net became the most accurate open source model and swept the academia. It became the most popular backbone model
- 03:06before ResNet appeared and brought more than 60,000 citations.
- 03:12It has much more parameters than AlexNet but half its error rate on the ImageNet challenge.
- 03:19It once again confirmed an intuition about deep neural networks that ConvNets need to have a deeper network to learn hierarchical
- 03:29representations of visual data.
- 03:33Furthermore, the VGGNet first time introduced the modular design concept with stages and blocks in the deep network.
- 03:43This design principle has become a common standard in the community.
- 03:52GoogleNet or Google LeNet is a deep network structure proposed by Google researchers.
- 03:59The name is to pay tribute to LeNet.
- 04:03The GoogLeNet team proposed the inception network structure to construct a basic neuron module and to build a sparse and
- 04:12high performance model.
- 04:15GoogleNet is the winner of the ImageNet classification challenge
- 04:19in 2014, it has made a bolder network structure attempt. Although there are 22 layers, the size of GoogleNet is much smaller
- 04:30than AlexNet and VGGNet.
- 04:33GoogLeNet has five million parameters about 12 times less than AlexNet and 27 times less than VGG19 network.
- 04:43Therefore, when the memory or computing resources are limited, GoogLeNet is obviously a better choice and the accuracy of GoogLeNet
- 04:52is even better.
- 04:56Generally speaking, the most direct way to improve network performance is to increase network depth and width. Depth refers
- 05:06to the number of network layers and width refers to the number of neurons.
- 05:12However, this method has a problem of easy overfitting, increase computation overhead and the gradient vanishing problem.
- 05:22The way to solve this problem is to reduce the parameters while increasing the depth and width. To reduce parameter, it is
- 05:31straightforward to think of tuning full connections into sparse connections.
- 05:37However, in terms of implementation, the actual computation efficiency will not be much improved after the full connection
- 05:48becomes sparse because the most modern hardware is optimized for the dense matrix computation.
- 05:57Although the sparse matrix has a small amount of data and but the current hardware doesn't
- 06:06really fully support this kind of data structure.
- 06:12Therefore, Google team proposed inception module, a sparse network structure that can generate dense data which can increase
- 06:21the performance of the neural network and ensure the efficiency of the use of computing resources.
- 06:29So let's take a closer look.
- 06:32GoogleNet is proposed of three basic components.
- 06:36The stem does preliminary processing of the image data and enter three stacked inception groups.
- 06:45Each group consists of three inception modules and each group corresponds to a classification head.
- 06:53So GoogleNet has totally three classification heads and the joint optimization of those three heads can be considered as
- 07:02multi task learning, which helps to improve the generalization ability of the whole network.
- 07:10So the design of the classification head and stem is more conventional.
- 07:15So let's take a closer look at the inception module.
- 07:22The inception module applies parallel future operations on the input from the previous layer.
- 07:30Multiple receptive field size are used for convolution. Here,
- 07:34in this case they are using 1x1, 3x3 and 5x5 convolutional kernel and 3x3 pooling is also
- 07:44used to preserve the depth of the input.
- 07:49On the one hand it increases the width of the network and on the other it also increases the adaptability of the network
- 07:59to different skills.
- 08:01The problem here is that pooling layer preserves the depth of the input layer.
- 08:07This means that the total depth at the output of such module will always become bigger.
- 08:14It results in a huge computation overhead and increase.
- 08:21To improve this problem, the bottleneck design has been proposed.
- 08:26Generally speaking using bottleneck design, we can keep the width and height unchanged, and only reduce the depth.
- 08:35And this can be easily achieved by using a 1x1 convolution with stride=1 and we just reduce the filter numbers
- 08:44as shown in this example.
- 08:51And we can see in the figure a 1x1 convolution kernel is added before each 3x3 and 5x5 convolution
- 09:00and also added after a max pooling.
- 09:04This effectively reduces the number of channels during the concatenation and this design will now decrease accuracy.
- 09:13You can see that the current computation complexity of this module has been reduced by about 58%.
- 09:22The 1x1 convolution kernel is exactly the same as a normal convolution kernel, except that it's no longer learns
- 09:31the local area and does not consider the correlation between pixels. And the 1x1 convolution
- 09:38is thus integrating the information of different channels.
- 09:46Okay, thank you for watching the video and we will continue our discussion in the next video.
To enable the transcript, please select a language in the video player settings menu.