This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:01Okay, So we have looked at Bayes' rule in theory and now what we're going to do is we're going to apply this Bayes' rule
- 00:12using the PDF version that I showed you last time with probability density functions.
- 00:19In a practical setting involving binomial distribution, which is the familiar discrete random variable case that we saw in
- 00:28the beginning of this course.
- 00:30So let's think about this.
- 00:32Okay, we're talking about data generative process, the data coming from a binomial distribution.
- 00:39And so the likelihood function in this particular case would look like the output shown here for some particular value
- 00:50of the probability of success θ.
- 00:53And given 46 successes in 100 trials, for example, I would get this particular probability of getting 46 successes out
- 01:04of 100 trials, assuming that θ is 0.5
- 01:09So in the likelihood function, remember θ is always variable.
- 01:12So we could write this likelihood function in the following way, we can drop the normalizing constant.
- 01:19As I have been discussing repeatedly, this normalizing constant is of secondary interest to us.
- 01:24What's interesting is the kernel of the distribution.
- 01:27So let me write out the kernel of the distribution in the binomial likelihood, which is θ to the power of 46 times,
- 01:35(1-θ) to the power of 54.
- 01:37So this term, this kernel is now proportional to the likelihood that we had here because I've dropped the normalizing
- 01:45concept.
- 01:46Okay, so
- 01:49our goal is to get the posterior distribution of the parameter θ.
- 01:55Given the data that we have, given the 46 successes out of 100 that we have.
- 02:00So formally what we need is the posterior distribution of θ, which will be a continuous distribution.
- 02:06With support 0 to 1.
- 02:08Why 0 to 1?
- 02:09Because it's the probability we are talking about.
- 02:12So the support of this distribution will be 0 to 1.
- 02:15And this can be calculated, the posterior distribution can be calculated up to proportionality.
- 02:21So ignoring the normalizing constants by multiplying the likelihood which we've got here
- 02:26with the prior. Now I said the prior but I haven't actually defined the prior for θ.
- 02:32So that's what I'm going to do next.
- 02:33Okay, So let's think about what kind of priors we might actually use for θ.
- 02:41Well, what do we need?
- 02:42We need a probability density function.
- 02:45That has a support going from 0-1.
- 02:49It should range from 0 to 1 because we're modeling a probability here.
- 02:53And it should allow us to represent our prior uncertainty about this θ parameter.
- 03:00That's the significance of the prior distribution, which I will of course unpack further in the coming lectures.
- 03:06The prior distribution is going to represent what we believe are plausible values of the parameter θ.
- 03:14Before we have even seen any data.
- 03:18Before we've seen the current data that we're trying to model, which is 46 successes out of 100.
- 03:23So that's why it's called the prior on the parameter θ.
- 03:28It's specified before actually looking at the data.
- 03:32Okay.
- 03:33So it turns out that in this probability distribution theory the beta distribution is a very good candidate for using
- 03:42as a probability density function for the θ parameter.
- 03:46It really works very well.
- 03:47Why?
- 03:47Because it has a support that goes from 0 to 1.
- 03:50So, what you're seeing here is the beta distribution specified.
- 03:54So for any value between 0 to 1, this is the term that the beta distribution will have for defining the probability density
- 04:02function and for all of the values outside this range, the value will be zero.
- 04:06Okay, So that's what the definition is of the beta probability density function.
- 04:11And we're gonna use this beta density function for modeling our prior beliefs about the parameter θ.
- 04:18So, one thing to notice about this probability density function is that it's defined in terms of two parameters, you know
- 04:24just like the normal distribution was defined in terms of μ and σ.
- 04:28The beta distribution is defined in terms of two parameters, which we'll call a and b in different books, you'll see different
- 04:34terms like α and β and so on.
- 04:36But they're the same thing.
- 04:38So, we will write the beta distributions in terms of B(a, b)
- 04:42So, whenever I write B(a, b), it means that I'm talking about a particular beta distribution with some parameters a and b.
- 04:48Okay.
- 04:50All right.
- 04:51So in R, you will often see the DPQR family of functions for beta.
- 04:59So in those functions instead of a and b, R has the convention of writing shape1 and shape2. shape1 refers
- 05:08to a and shape2 refers to the parameter b.
- 05:12So don't be confused about that, but when you're writing, when you're computing things in R, you'll be using shape1,
- 05:18shape2, and not a and b.
- 05:19And just for your information, you don't need to use this information at all in this course.
- 05:24But the expectation and variance of beta is defined by these equations here.
- 05:30Okay.
- 05:31It's just good to know.
- 05:32It can be useful in some situations.
- 05:35Okay.
- 05:35So how do I decide what prior distribution to use for θ?
- 05:42That's the key question now.
- 05:43And what that means is I have to decide what those parameters a and b are.
- 05:48Because those parameters will determine the shape of this beta distribution, which represents our beliefs about θ before
- 05:55we've seen any data.
- 05:57So, how do we do this?
- 05:59Well, let's look at the parameters here.
- 06:04We can plot some beta density.
- 06:06So, I'm just using the "dbeta" function, you know, to plot these distributions.
- 06:13So you can see that the support ranges from 0 to 1.
- 06:15So this is a bounded distribution.
- 06:17There's nothing beyond 0 on the left side, nothing beyond 1 on the other side.
- 06:22And so what I'm doing is I'm varying the a and b parameters and what this shows you is that when a and b is
- 06:271, you get the uniform distribution between 0 and 1 when I increase a and b to be the same values, but I increased
- 06:35numbers.
- 06:35The distribution starts to get tighter and tighter and tighter.
- 06:38So what this means is that I can use my a and b parameter specifications to decide on how unsure I am about the plausible values
- 06:47of this parameter before I've seen the data.
- 06:50So it's a way to represent my prior uncertainty about θ by specifying a and b.
- 06:55I can do that.
- 06:56So one way to interpret the a and b parameters is that you can think of a as a number of successes in some imaginary experiment
- 07:04and b as the number of failures in an imaginary experiment.
- 07:07So in this particular case when a and b is 6, what you're really assuming is that the prior
- 07:14mean for θ is 0.5 because there are six successes and six failures.
- 07:18So 6/12 would be 0.5
- 07:21So that's why this distribution is centered on 0.5.
- 07:24If I had a smaller number for a and a larger number for for b.
- 07:29That means I would assume that the probability of success is much lower.
- 07:33So then the distribution was skewed to the left.
- 07:35So you can define different θ prior for θ distributions, you know, depending on your prior beliefs and later on I
- 07:42will talk about this a lot more what what it means to specify a prior.
- 07:46But right now we just want to understand the mechanics of this.
- 07:50Okay.
- 07:50So what we are now going to do, what I've just done is I'm going to define some prior on the θ parameter and what
- 07:56that will do is that it will allow me in equation 2 to plug in a prior
- 08:05distribution here.
- 08:06I already know the likelihood.
- 08:08That's the binomial up to proportionality.
- 08:11Okay, alright.
- 08:13So
- 08:18the obvious question to ask at this moment is like how should I decide what a and b are going to be?
- 08:25This is going to be a subjective move that I'm going to make.
- 08:28So what we're going to start with right now, just as starters is we're going to assume that we don't have much prior information
- 08:35So we could set a and b both to 1, in this case we would have something called an uninformative prior.
- 08:41That's this uniform distribution that I'm showing you here.
- 08:44If however you have more prior information for example from previous research, right on this particular problem then you
- 08:52could define a prior that is more tightly distributed around whatever you think the true value is.
- 08:57But we will look at more examples about that later.
- 08:59But right now, this is a reasonable starting point.
- 09:01We just choose some reasonable priors for the parameters a and b
- 09:05for the prior data and then we will get an idea, you know of what the procedure is for calculating the posterior given the
- 09:16data and given a prior.
- 09:17Okay, so what I'm gonna do now, just for fun, I'm going to calculate the posterior distribution of θ using that equation
- 09:252 that I showed you earlier, I'm going to compute the posterior distribution of given four different priors.
- 09:32So I'm taking increasingly tight prior, notice that as a and b increase the distribution will get tighter and tighter.
- 09:38You should play with this and check that this is true.
- 09:40Okay, so what you will notice is that we are getting more and more informative priors as we go down this list.
- 09:47Okay, so what I'm gonna do is I'm going to multiply the binomial likelihood up to proportionality with the prior distribution
- 09:56for the θ parameter.
- 09:58And I'm gonna use four different priors and I'll show you the four different posteriors that I get.
- 10:03Okay, so let's just plug it in.
- 10:05It's actually literally a simple multiplication involving no complicated mathematics.
- 10:10So the first time I saw this, I was pretty confused because it's surprising that you can just multiply two distributions
- 10:17But what you're actually doing is that you're multiplying the mathematics form of the kernels of these two distributions
- 10:24And the reason that we're using the beta distribution here is that it has the same form as the binomial distribution.
- 10:31So if you look here in this first equation 5, what you notice is that here is the likelihood term.
- 10:38Okay, so where is the likelihood?
- 10:42Yes, this one here (θ)^46 times (1-θ)^54
- 10:47That was my likelihood term.
- 10:50The likelihood function.
- 10:52And what I've got here is my beta prior.
- 10:55Just the kernel of the beta prior.
- 10:57Okay.
- 10:58Which is a B(2, 2) prior?
- 11:00Just as an example.
- 11:02Okay, as I said, I'm trying out B(2, 2) here.
- 11:05And so how do I solve this equation?
- 11:08How do I multiply these two probabilities.
- 11:11It's just adding the exponents.
- 11:14Because I've got (θ)^46 here and I've got (θ)^(2-1) here.
- 11:20So what is going to be the result of this multiplication?
- 11:23It's going to be (θ)^46 + (2-1)
- 11:28So that's how I end up with (48-1) here.
- 11:31And similarly for (1-θ), I'm just adding up exponents.
- 11:34That's it.
- 11:36I'm not doing anything special here.
- 11:38And then I'm doing the same thing now for different priors.
- 11:40The likelihood remains unchanged.
- 11:42Okay, this likelihood remains unchanged in all the four cases.
- 11:46What changes is this term here?
- 11:48I'm changing the prior specification.
- 11:54So the a and b values are 3 here, a and b is 6 here, a and b is 21 here.
- 11:59So you can see that the posterior can be trivially calculated, up to proportionality.
- 12:05What we don't have is the normalizing constant, but as I said, we can always work that out.
- 12:10So and that's really it.
- 12:14I mean that's the whole story in a nutshell.
- 12:16That's how I'm going to use Bayes' rule to derive the posterior distribution at least in this simple case.
- 12:23So we can now try to visualize what's happening when we have a particular prior we have a particular likelihood
- 12:32and we get a particular posterior.
- 12:34So this is the code that I'm going to use for this.
- 12:36I won't explain this code.
- 12:38You can look at it later and it's also discussed in the lecture notes.
- 12:41Okay, in the textbook.
- 12:42So you should look at that later on.
- 12:43But what I want to show you is the result of this code, what it shows you is that the posterior distribution for θ, which
- 12:51is the solid line here is going to lie somewhere between the scale likelihood and the prior. The prior distribution
- 13:05is in this particular case.
- 13:08The prior distribution.
- 13:09Is this one here.
- 13:12And rather this is the likelihood, sorry, this is a scale likelihood and we've got the prior distribution here.
- 13:18The posterior is going to be a compromise between the prior and the likelihood.
- 13:23Okay, so this is a very important idea that I'm going to unpack for you in great detail in the coming lectures, but this
- 13:30is the first intuition on the relationship between the likelihood and the prior.
- 13:34The prior is actually going to modulate your posterior distribution.
- 13:39The more precise your prior is going to be, the the more the posterior distribution will drift towards the prior.
- 13:44And intuitively, this makes a lot of sense too, if you have a lot of knowledge about a particular problem and you get
- 13:51a new data set, that new data point, is not going to shift your belief much because you have a lot of prior knowledge
- 13:58about the problem.
- 13:59But if you know nothing about your problem and you get some data, that data will shift your belief about that particular
- 14:06problem that you're studying.
- 14:08This happens in day to day life as well.
- 14:10Okay.
- 14:11And this is just another visualization of what I just showed you.
- 14:14We can basically see that the the posterior is going to lie somewhere between the prior and the likelihood in this case.
- 14:21Okay, Alright, so we've seen a simple example of how we can derive the posterior given some data in the simple binomial case
- 14:29and what I showed you is that the posterior that we got actually belongs to the same family of distributions that the prior
- 14:37belongs to.
- 14:38How is that the case?
- 14:39How did that happen?
- 14:41Well, if you look at the posterior here, these posteriors here, they all have the same form of the beta distribution
- 14:49kernel.
- 14:50If you go back to the beta distribution, what did it look like?
- 14:54This beta distribution kernel is the form that we're seeing in the posterior.
- 14:59So the posterior is just like the prior, it has the same form.
- 15:04It belongs to the same family of distributions, namely the beta distribution.
- 15:08And that's why we say that this particular example of a binomial with a beta prior is called a conjugate case because
- 15:16the prior and the posterior end up coming from the same family.
- 15:20So, this is a very important example which illustrates the key functionality that Bayes' rule gives you for figuring out the
- 15:28posterior.
- 15:29So basically, that's it, you know, from now on, we are going to be just computing this.
- 15:35Either we're going to do this analytically, we're going to do one more example of that or we're going to do it computationally
- 15:41and I will show you how that will work in the coming lectures.
- 15:45So, the next lecture, we'll look at another example of a paper pencil analysis like this one with a conjugated case involving
- 15:52the Poisson distribution and the Gamma distribution.
To enable the transcript, please select a language in the video player settings menu.