2.6 Reinforcement Learning

This video belongs to the openHPI course Künstliche Intelligenz und maschinelles Lernen für Einsteiger. Do you want to see more?

Enroll yourself for free

2.6 Reinforcement Learning

Time effort: approx. 6 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00In this unit we want to deal with the last paradigm, Reinforcement Learning.
00:05The concept of reinforcement learning can be with learning the commands of a dog puppy.
00:11A dog can easily perform a limited number of activities.
00:17When the master says "Sit down", so the dog reacts with an action or not.
00:23Since the puppy does not know the meaning of words, so he must learn the behavior.
00:29If the puppy barks for example at the exclamation "Mach Sitz", so the master is visibly angry.
00:37On a command of the master the puppy reacts at first arbitrarily at a young age and receives a punishment from the environment, thus the master, for example by not handing out treats.
00:50Does the puppy react correctly, so there is a treat or a stroke.
00:56If the master now repeats this attempt several times, then the dog really sits down once at this time.
01:02The puppy remembers itself by the received treats, that this was the right behavior.
01:08Next time the puppy will probably repeat this reaction, without knowing the exact meaning of the words "Mach" and "Sitz".
01:16You can formalize this as follows: The dog, as a technical term the agent, performs an action, Sitting, for example, would be the action of the dog.
01:27The environment, in this case the master, reacts with a reward or a punishment, in the form of treats or caresses or just by holding back from treats or caresses.
01:39The observation of the environment is again listening to the commands of the master.
01:45The agent, i.e. the dog, tries to adapt its own behavior in this way, that he reacts appropriately to the environment with actions to achieve the maximum reward. This is how he learns.
01:58This concept describes well the general concept of the Reinforcement Learning.
02:02Now imagine another example: This agent or character has to go through the following labyrinth.
02:13A red field says, that one must not pass.
02:19With an arrow you automatically go one to the left in the next step.
02:23The goal of the game is to get the figure to the blue field with as few steps as possible, that is, to get to the target field.
02:31Agent only knows the fields around itself, the bird's-eye view, as shown here, can only be seen by us and the game character in turn is not
02:41The agent sees blocks above, left, right and below him and also remembers his last action.
02:49Reward or punishment is one point per step.
02:53The goal is to arrive at the blue field with the minimum number of points or steps.
02:59The game figure can again be moved upwards as an action, left or right or even go down.
03:05The blue crosses show the fields, which the agent has already visited at least once.
03:12The agent or the game character walks across the playing field very haphazardly at first and gradually learns the peculiarities of the environment.
03:21For example, what costs him a lot of penalty points and what is worth it.
03:26Thus he learns a general behavior, which is not limited to this one field.
03:33Let us now look at the following general examples:
03:38In the first example the agent can neither go up nor down, to go to the right would cost one point, but just bring the agent back to his place.
03:48So the agent learns after a few tries, go left.
03:52In the second example the field above is already known, the probability that the solution is there is rather low.
04:03Therefore a reasonable behavior would be, to go on unknown territory.
04:07Thus the agent learns step by step, to explore rather unknown terrain.
04:12A possible solution through the game board would be the following: The agent arrives at the target, but makes some mistakes on the way.
04:21He achieves a penalty score of 27 points, partly by double entering some fields.
04:28The ideal way would be the following: However, it is not the exact path that is important here, but rather the behavior, which the agent has derived through interaction with the environment.
04:42He has learned, for example, that it is not worth it, forward into a backward arrow.
04:48To come back to the concept of reinforcement learning, the agent tries in reinforcement learning. the agent tries to adapt his actions to the reactions of the environment,
05:00with the goal, to maximize the reward or minimize the punishment
05:05What the agent learns or derives from its behavior, is called Policy, in English.
05:11In this case, this is the machine learning model, the Reinforcement Learning.
05:16Now that we have covered all categories of machine learning theoretically, so we will now deal with the topics of data provision and also how the individual models can be evaluated.