Exkurs 1 Bilder und Texte

This video belongs to the openHPI course Künstliche Intelligenz und maschinelles Lernen für Einsteiger. Do you want to see more?

Enroll yourself for free

Exkurs 1 Bilder und Texte

Time effort: approx. 8 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00In this excursion we want to take a look at ourselves, how to process mainly pictures or also texts.
00:07Up to now, we have actually always concentrated on tabular data in the course, of course machine learning and artificial intelligence is also used in other domains.
00:15The application case, for example, would be interesting, which we have considered in artificial neural networks, where we said we want to distinguish dogs and cats from each other.
00:26We have used a table for this, where we have listed various attributes.
00:30Which of course we could also do now, the image of the cat is to be used directly,
00:35that means that we have the image in the format and the annotation, that is a cat.
00:40Of course we would have to think again, how we code the whole thing, for example cat would still be the value 0.
00:46We will be happy to help you with this application in the area of images look at the known MNIST image data set.
00:54This is more or less the standard example, where programmers learn in the field of machine learning,
00:59how image analyses can be made, or how models can be learn to perform classification using image data.
01:08This MNIST dataset consists of a Large number of handwritten digits between 0 and 9.
01:15And for each of these different digits it is always known which digit it was, so for example to this 4 we would know, not only that it is a picture of a 4, but that it shall represent a 4.
01:27These images are available in a 32 by 32 pixel format.
01:32This means you have 32 times 32 values, where each individual value represents one pixel.
01:40This is for example with the values 0 to 255, where these are different gradations of grey tones.
01:47So if you have the value 0, this would be a completely white tone. if you have a value of 255, it would be almost exactly black, and in between you have different shades of gray.
01:57Let's reduce the example 32 times 32 to 6 times 6, so that we can present it well here.
02:04As you can see, there would be a 4.
02:06These 4 we can now, as we have them now in this pictorial format, transferred to the pixel values.
02:13This means that you may not be able to see this so well with the human eye, but the computer can process it, because there are concrete values and the computer can interpret, how these values finally are to use.
02:25How would you do such an image analysis now?
02:29For example, we do not have a table here, but really a matrix of different values.
02:34A very simple approach would be, for example, if we want to use the neural net from the last unit, that you can take this image and finally cut it in every column.
02:46This means that we have the first column, the second and so on and from this image, from a matrix, can finally make a row.
02:55This means that we would more or less stick the individual columns together.
02:59What we can do now when we have this long row of individual pixel values, is, enter them into the neural network.
03:07Then we would need as many input neurons as we have pixel values, for example, is now indicated here.
03:15The first pixel would go into the first input neuron, the second into the second input neuron and so on.
03:22For example, if we now have 36 pixels, our neural network should have 36 neurons in the input layer.
03:30In reality you often do it a little bit differently, because artificial neural networks do not only exist in the architecture we have been looking at.
03:40In fact, there are many more architectures, Very well known are the so-called Convolutional Neural Networks.
03:47These are artificial neural networks that leave the images in their original formats, that is, in their pixel matrix and then run over the image with a filter.
03:57What happens is that information is extracted, which in turn are in pictorial format, i.e. in the form of a matrix.
04:04And the whole thing is done several times, so that originally relatively, even trivial bodies are discovered in the first layer, for example lines or strokes
04:15and more complex patterns can then be recognized in further layers, so in the second layer already something like circles,
04:20that in artificial neural networks, which have many layers, even very complex bodies can be recognized.
04:27More about this can be found in the openHPI course "Practical Introduction to Deep Learning for Computer Vision", but where the topic is once again looked at more from a programming point of view.
04:37Equally interesting is the topic of word processing, with texts we have dynamic lengths.
04:45This means, for example, that our input can be ten characters long once, but in another example 20 characters, and we want to have a model that is able to handle it.
04:58What is used for this purpose are so-called recurrent artificial neural networks, these are networks that are ultimately very similar in their architecture to which we have already looked at and only have one special feature.
05:10For example, we now have a sentence "The film was insanely good", and we have marked it as positive.
05:17For example, we want to predict, whether a statement about a film is good or bad, so whether the person liked the film or not.
05:26Then we can do something like analyze the sentence word by word, we can say for example, certain words have certain effects on our classification.
05:38"Good", for example, would be something where we would say there we give a big factor to the fact that it is positive.
05:43"Insane", if we look at it alone, however, would be rather a bad factor and in this context of the phrase "insanely good" we would actually want to reinforce the "good".
05:53And this is exactly what artificial neural networks use, if they are built recurrently.
05:57If we now take a closer look, we can take a closer look at the first word.
06:01We would transform this in some way and then enter it into the artificial neural network.
06:06What we would then do is to move on, that is, to film.
06:11And this is where the interesting part happens, the neurons in the interlayer would now no longer just pass on information to the next layer,
06:19but additionally in the next input the information from the previous input.
06:26This means, for example, that if we are now at "film", we could provide information, that we made in the intermediate layers during the last input, i.e. at "The", again into the forecast of "Film",
06:37which leads to the fact that if we continue at the point "insane" are, can interpret "insane" somehow and process "good" on the next input.
06:47But now understand, with this artificial neural network, that "insane" in the context of "good" has a positive meaning.
06:55This allows contexts in a sentence to be interpreted, which makes a forecast much easier.
07:00That was it for the excursion pictures and texts.
07:05We wanted to show him briefly that not only tables can be processed of artificial neural networks or other machine learning models,
07:11but that there are also many other exciting application domains, for example pictures and texts.
07:15There are of course others, but they would go beyond the scope of the course.

About this video

Hier geht es zum openHPI-Kurs Praktische Einführung in Deep Learning für Computer Vision