Este vídeo pertenece al curso Introduction to Bayesian Data Analysis de openHPI. ¿Quiere ver más?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00Okay, so in this lecture, I'm going to talk about this very important concept of maximum likelihood estimates.
- 00:07This is a concept that we will need when we are talking about actual bayesin analysis in the coming lectures.
- 00:15So it's very important to understand what we've seen so far is, we've seen examples of discrete and continuous random variables
- 00:21and we know what we can do with these distributions.
- 00:25The kind of questions we can ask of these distributions now today, what I want to talk about is the expectation and variance
- 00:34of a random variable.
- 00:35So in the discrete case, the definition of the expectation of a particular random variable
- 00:40call it Y,
- 00:41you can call it with any variable, as I mentioned earlier.
- 00:45So you have some random variable Y with some probability mass function F(Y).
- 00:50And so you could compute the expectation of why by using this formula, which is basically multiplying every possible outcome
- 01:00Why with its probability and summing up Those values.
- 01:06So for example, if you toss a fair coin once, that's your Bernoulli situation?
- 01:11So the possible outcome that tails or heads.
- 01:14And let's say the probability of each outcome is 0.5.
- 01:18So in that case the expectation of that particular random variable is going to be this calculation here, which is zero, multiplied
- 01:27with this probability and one multiplied with this probability, which gives you .5.
- 01:31So that's the expectation here.
- 01:33And the variance is computed with this formula.
- 01:35I won't say much about this except that you're still computing expectations.
- 01:40You know of some function of this random variable.
- 01:43This is a discussion that kind of is not relevant for us, but if you're interested, I'll point you to some textbooks that
- 01:49you can look at.
- 01:50Okay, so the expectation, what does it mean?
- 01:55So the expectation has this interpretation that if you were to repeatedly do the experiment with larger and larger and larger
- 02:01sample sizes, we would start getting the expected value of that random variable in this case it's 0.5.
- 02:10In the in the case of the Bernoulli example, I gave you theta has a value of .5 as we increase the sample size and
- 02:19repeatedly run the experiment will get closer and closer to .5 in the in this limiting case.
- 02:24Another way to think about the expectation is to think of it as follows, it is the weighted mean of the possible outcomes
- 02:33weighted by the probabilities.
- 02:35That's what I just did earlier.
- 02:37I'm literally taking the weighted mean weighted by the probability is of particular outcomes.
- 02:41If theater had been 0.1, you know, the probability of success had been 0.1, then this would be multiplied with 0.1 and not
- 02:480.5 and this zero would be multiplied with 00.9.
- 02:51So it's a weighted sum in that sense.
- 02:53Okay, so that's the expectation and just as information, it's good to know this.
- 03:00Although we won't really need this information in this course, it's still good to know that you can compute the expectation
- 03:07of a particular random variable using this formula and times data and as the sample size in this case.
- 03:13And the variance is computed with this formula here.
- 03:16So if I have particular data, you know with k successes out of n trials, I could get an estimate of theta which I'm calling
- 03:23theta hat.
- 03:24So whenever I talk about the estimate of a parameter from real data, I'm going to put a hat on top of it.
- 03:30So I'm gonna call it theta hat.
- 03:32And so similarly, the variance of that of some particular data vector of data that I have some Y I could compute by, you
- 03:39know, calculating this value once I've got an estimate of data.
- 03:43I know what N is because I decide on that as an experimenter.
- 03:47Similarly in the normal distribution, the expectation of Y has the same formula as in the discrete case.
- 03:54Except that this has a continuous, you know, expression in terms of the integral, integral is just summation.
- 04:01So we're just summing up this weighted sum here, but we're multiplying each possible outcome with a probability density now
- 04:09not a probability, but probability density.
- 04:12So that's the only difference here.
- 04:14And because this is continuous, we have to do this integration because we have an infinity of values.
- 04:18That's the beauty of calculus.
- 04:20That's what gives us the ability to do this kind of summation in continuous space.
- 04:25And so this expectation in the normal distribution is the parameter mu,
- 04:30and the variance will be sigma squared.
- 04:33So we can calculate the variance by the usual formula, you know that you have for sigma square, you can use that and you
- 04:43get those estimates.
- 04:44So these you must have seen in standard introductory courses, you know, in statistics.
- 04:49So these are what the important ideas that we're going to work with in future lectures.
- 04:56The expectation and the variance and so on.
- 04:58I should mention that all these, I just stated that the expectation is this and and the variance is that for the
- 05:06normal and the binomial, but all these results can be easily derived analytically,
- 05:12just on paper, you can quickly derive these and I've done that in my other lecture notes which are online, if you if you're
- 05:18interested in the proof and they're really simple proofs, they just require a little bit of calculus in some cases in the
- 05:25in the continuous case.
- 05:26But it's not very complicated, but you can find the proof here and also you'll find them in every statistics textbook, you
- 05:33know, mathematical statistics book.
- 05:36Okay, so now what I want to get at here is that if I have some observed data, I can compute the estimate of theta
- 05:47that is theta
- 05:47hat.
- 05:48I can work that out in the binomial case.
- 05:50It would be K, that is a number of successes divided by the total number of trials. Now, the quantity theta had that I compute
- 06:00here is the observed proportion of successes.
- 06:03and it's called the maximum likelihood estimate of the true unknown parameter theta.
- 06:09We don't know what theta is.
- 06:10We will never know what theta is, but we can estimate it from the data.
- 06:14So once we have estimated theta in this way, we can of course calculate the variance as well using the formula I showed you
- 06:21because that involves this theta as well.
- 06:24And then these estimates the expectation and the variance are then used for statistical inference, hypothesis testing, all
- 06:31that, all that good stuff that we've learned about in frequentist statistics.
- 06:36So the estimate is called the maximum likelihood estimate.
- 06:41But what does that actually mean?
- 06:42Okay, so I'm gonna explain that now.
- 06:45So we have to understand what a likelihood function is in order to understand maximum likelihood estimation.
- 06:53So, in the binomial example we've got some probability mass function, which I hope you remember.
- 07:00And that probability mass function contains three terms the number of successes which you can call K or X or whatever,
- 07:07And the total number of trials and theta,
- 07:09the parameter theta, which determines the probability of success.
- 07:15So if you look at that probability mass function as a function of theta fixing K and N.
- 07:22You've done the experiment, let's say you get 7 successes out of 10 trials K and n are now fixed quantities.
- 07:29They're no longer random data.
- 07:31So theta however, it could be treated as a valuable and then the same probability mass function can now be seen as a function
- 07:40of theta.
- 07:41And we call that the likelihood function.
- 07:44And it's often written as this curly L theta or sometimes it's written like this.
- 07:50So there are different ways.
- 07:51But basically you can just think of the likelihood function as the probability mass function or the probability density function
- 07:58as a function of the parameters rather than a function of the data as we saw earlier.
- 08:03So that's the shift in thinking that leads to the likelihood function.
- 08:07So suppose that we were to run 10 trials and we get seven successes.
- 08:13So in that case the likelihood function would look like this.
- 08:16N and K are now fixed.
- 08:18The only thing that's varying is theta.
- 08:19Now I can plot this function.
- 08:21Theta can only have values between zero and one.
- 08:23It's a probability.
- 08:25So the X axis the support
- 08:27so to say of this variable will be between zero and one.
- 08:32So if I plot this function now as a function of theta, this is theta.
- 08:36Now, all the possible values of theta.
- 08:38What you will notice for this particular data that I have The maximum point of this likelihood function is at .7.
- 08:49What is 0.7, it was the estimate regard of data from the expectation formula
- 08:567 out of 10, 0.7 is the maximum point.
- 08:59So that's why the K over 10, you know, the estimate of theta that we get from a particular data set is going to
- 09:08be the maximum likelihood estimate.
- 09:10And what that means is that it's going to mark this 0.7 marks the maximum point in this likelihood function which is a
- 09:18function of theta.
- 09:20So that's the amazing thing you have that a single data set is gonna give me an estimate the most likely estimate of the
- 09:29parameter,
- 09:30given the data that I have.
- 09:31So that's what a maximum likelihood estimate is.
- 09:35So
- 09:37in the binomial
- 09:39it's K over N as I just showed you.
- 09:41And in the normal distribution it would be you can get the maximum likelihood estimates of mu and sigma and it would have
- 09:47the same interpretation except that we're talking about a different distribution here.
- 09:52Okay, so I hope that this intuitive introduction to the idea of maximum likelihood estimates is good enough for purposes
- 10:00now.
- 10:01But if you want to read more about this and it's not a lot, you know, this is just a short topic that you will
- 10:07find in most textbooks, you will uh you will see it in textbooks like cans text book, which is available for free online
- 10:15you should read this be an interesting introduction to maximum likelihood estimation and they give a more formal introduction
- 10:21there.
- 10:22Of course, I'm giving a very intuitive picture about MLE is and of course there's a lot of detail as always, you know, in every
- 10:28topic there's tons of details that you can get into.
- 10:31But the important issue, I have explained what we will need for this course,
- 10:35I have explained now.
- 10:37One important thing I wanted to understand is that in a particular experiment like you run trial with sample size
- 10:4510, you get seven successes, you get seven out of 10 as your estimate of theater, it is a maximum likelihood estimate, but
- 10:52it's not necessarily the true value of that parameter.
- 10:56So if you have small sample size is what will happen is here, I'm running an experiment with increasing sample size is the
- 11:03true value of the Parameter is .7.
- 11:06But for small sample sizes, you will notice that in particular experiment
- 11:10so each dot is an experiment with increasing sample size.
- 11:13What you'll notice is that with small sample sizes, the maximum likelihood estimate is going to fluctuate around the true
- 11:20value is going to bounce around.
- 11:23So statisticians call this vibration effect, vibration of the parameter
- 11:29in small sample size. It's only when you get to larger sample sizes that you consistently start getting maximum likelihood
- 11:37estimates from the data that represent the true value.
- 11:40The practical implication of this figure is that is that if you have a small sample size and you get a sample mean, you know
- 11:48like k out of n in the particular example I showed you with the binomial,
- 11:53there is no guarantee that this is reflecting the true value of the parameter.
- 11:58So to give you a really concrete example, I toss a coin 10 times.
- 12:02Normally, I would assume that this coin is a fair coin.
- 12:06I could easily get 10 tails one after another and the coin could still be fair.
- 12:13That means the true probability could still be 0.5. But you are in this space here of a small sample
- 12:19and you got this vibration of effect.
- 12:22You can end up with a wild mean that completely does not represent the true value of the parameter.
- 12:28So, just the fact that it's a maximum likelihood estimate does not entail that you're going to get the true value each time
- 12:35It's a super important point to understand.
- 12:37So in summary,
- 12:44we can compute the expectation and the variance for discrete or continuous random variable.
- 12:49I showed you some examples.
- 12:50And these estimates can be shown analytically to be maximum likelihood estimates in the sense that I showed you.
- 13:00And what we're going to do next is when we start doing bayesian modeling, is that we're going to be using these maximum likelihood
- 13:07estimates to understand what the bayesian analysis is going to give us.
- 13:14So these, these will play a very important role in the analytical examples that I will give you when we start doing bayesian
- 13:21modeling.
- 13:22The next lecture is now going to talk about another example of a random variable the bivariate case and more generally than
- 13:31multivariate case, where you don't have just one random variable, but you have multiple random variables, all working at
- 13:38the same time to create a bivariate or a multivariate distribution.
- 13:42So that's an example I will discuss in the next lecture.
To enable the transcript, please select a language in the video player settings menu.