Це відео відноситься до openHPI курсу Künstliche Intelligenz und Maschinelles Lernen in der Praxis. Бажаєте побачити більше?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Прокрутити до поточної позиції
- 00:00After we have just our project for this Week we would like to have now
- 00:05once again focus on what you need to do in data labeling.
- 00:08Because it just plays such an important role.
- 00:11When we are in the Supervised Learning Context We just need training data.
- 00:16So this is labeled data.
- 00:17And we just want to look at how you can do that come. And you start in principle first
- 00:23with the idea that you can , for example, our texts.
- 00:27We are able to Collect rating pages.
- 00:31Maybe we have the collected over the years.
- 00:34That is, they are our raw data, our pure texts.
- 00:39If we are now in the context of Supervised Learning or semi-supervised learning,
- 00:43you want to do a classification or a regression. But in the end we need
- 00:49our label, so that we can be able to do so correctly,
- 00:53correctly represent this learning process because the models all work with it.
- 00:58that there are these labels, these annotations. so that the models can learn correctly.
- 01:03That is quite specific, for example, that we are
- 01:06on our raw data a stamp that says this record
- 01:10or this example is negative And this example is positive.
- 01:15So that we can just have that on our data.
- 01:17this pair between input and desired output.
- 01:22This can now be in the sense that we so that they could just collect this value historically.
- 01:28For example, this is often the case with structured process data, that is, that we have a
- 01:34to determine whether our customers could, for example, high risk assessment.
- 01:39Whether orders have been canceled or not.
- 01:41This is often due to the fact that we have our labels Just get more or less gifts,
- 01:48over time. And then we don't have to I think we should be thinking a lot about how we can actually do this.
- 01:53collected. This is actually the Ideal case. In practice, however, quite often then so,
- 01:59That you have to think about it. which labels you want to have and then
- 02:05with a tool.
- 02:07So, for example, says I have my Data, my various and I click
- 02:11now and, metaphorically speaking, every record I have, that stamp.
- 02:17which I would like to have, And go through that for my data.
- 02:22That means, for example, 100,000 data, raw data. which should be used completely for training,
- 02:28Then we have to label 100,000 of these raw data.
- 02:33And now, of course, we can think about whether to make a difference. There may be techniques that don't do that.
- 02:38as complex as becomes or Maybe not at all noticeable.
- 02:40And I'm pretty sure that most of the You're going to have a capture image like this
- 02:44have to be processed in order to on one side.
- 02:49And what you're doing there, on the one hand, is, You show that you are human.
- 02:54On the other hand, it is also the case that you are there may be labeling data without being noticed because the
- 03:00The task you are given, that is e.g. detect traffic lights,
- 03:06is now back in context from machine learning very exciting because the data,
- 03:09that you are labeling, they are usually correct.
- 03:11And then they can do this Capure manufacturers, for example
- 03:15to use this as training data for AI models.
- 03:20So it's actually not that rare that we're able to do this. these capture images are previously unlabeled data,
- 03:26which are then labeled by them.
- 03:29Or, of course, what's also very common is that e.g. at the post we have in a
- 03:36social, i.e. on a platform of social media,
- 03:39is that I uploaded a picture and there have linked Christian
- 03:45and thus in a certain way also a label set have for this image, namely that there Christian
- 03:52or, for example, at the very end of the Text, set three hashtags. So, for example
- 04:00AI, artificial intelligence or Machine learning.
- 04:05And now, if you think about it, it's raw texts. whose hashtags often come to the end
- 04:12and not quite at the beginning, so, there can be estimate how this happens,
- 04:16then you can say that I in a sense, this data set, so, the text labeled
- 04:21by saying that in this text artificial intelligence.
- 04:27So, unnoticed labeled. If that is not possible and You have to label it yourself, then there are still
- 04:35various other techniques, such as the Cluster labeling, that is, you don't use
- 04:39Label for a record, but sets a Label for a cluster. That means, for example,
- 04:45that you have an unsupervised Learning approach creates different clusters
- 04:50And then it's prying through.
- 04:51Of course, it can happen that you may also misspell a data set.
- 04:57Depending on the not so dramatic.
- 04:59So it may also be that the It's OK to wire up.
- 05:02That's actually what happens in hand labeling almost always, that data is sometimes mislabeled.
- 05:07That's why cluster labeling is an option. assuming you select enough clusters
- 05:14And it may not have just two clusters.
- 05:19A different but very popular approach is The best way to learn the transfer is to get that
- 05:25illustrates that if you are a pupil or a pupil, take a test
- 05:31in physics. And now we may have two students and one student has very good
- 05:38Math previous knowledge and in pupils has no Math skills, and both only learn now
- 05:45It's the absolutely basic physics stuff.
- 05:47Which of the two pupils will then probably write the better exam in physics?
- 05:51Probably the student who I had a math background.
- 05:55And so it is in transfer learning Also. So, for example, if we want to
- 05:59have a database for medical images,
- 06:02then this data is relatively small, so 2,000 images.
- 06:07then you usually have an enormous Performance boost when you take a model,
- 06:11what was previously on a whole large data set.
- 06:15What might happen from one application to another? area, that is, perhaps quite general
- 06:20Classification, but relative of the basic forms It's kind of like, you know, it's kind of like circles.
- 06:26Triangles, tips, lines, everything there is,
- 06:30then on this small data set a model can be often be found, i.e. specialized
- 06:36and yet relatively good results, too if it's a small database, learn.
- 06:41So, this transfer learning, that was just done in the last used by Data Scientist.
- 06:48Whatever else there is, which is very exciting is, the Active Learning is what ultimately more
- 06:54or less means that during manual Labeling constantly one or more models
- 07:00in the labeling process, i.e. as if the labeling directly to a model
- 07:08on the Internet. And while you're still labeling manually, A model that uses this data to learn
- 07:14and then help you in the labeling process , for example to provide forecasts
- 07:19, which you then either use to use it directly as a proposal,
- 07:24So if the model for example, is already quite good,
- 07:27that you just press enter if the forecast is correct.
- 07:31Or else, to know which So if you have data, the model is still very insecure.
- 07:37is used to label them. So, with that You so kind of, labeling strategies
- 07:43can be used to target data labeling. Then you may not need
- 07:47100,000 of these raw data, It's a much smaller amount.
- 07:53Just as exciting is the area weak Supervision that's a little bit like the
- 07:58interface between conventional programming and AI programming,
- 08:03because what you want to do there or what you want to do in weak supervision, is that one makes heuristics
- 08:09which, in a way, domain knowledge, image and thus label data.
- 08:15And programmatically.
- 08:16That is, if I now use my data sources I'm trying to describe them.
- 08:22These heuristics must be used precisely because are heuristics, absolutely not perfect,
- 08:26but only better than guess. And in this weak supervision approach
- 08:30the different heuristics are combined.
- 08:33This can be quite simple, using a simple counting through or with complete or
- 08:38much smarter approaches, And so basically these different heuristics
- 08:43to then merge a probabilistic labels.
- 08:48And especially in the last two techniques we actually do research at the HPI as well
- 08:52very intense. That is why we shall give a little excursion this week,
- 08:57which you like to see if you want to.
- 09:02Exactly, that's the way to data labeling.
- 09:05A very important topic in the field of artificial intelligence,
- 09:08because we're doing Supervised Learning so often and labeled data that are the basis for this.
- 09:14And finally, more possibilities than just data from front to back
- 09:20but also other approaches which we would like to show you.
Щоб увімкнути запис, виберіть мову в меню налаштувань відео.