This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:01So what we've been looking at in the previous lecture is the Poisson-Gamma conjugate case.
- 00:06And I showed you that we have to somehow figure out what the prior is on the λ parameter for the Poisson likelihood
- 00:14and we know that the Gamma distribution is a reasonable probability density function we can use as a prior for the λ
- 00:22parameter.
- 00:23But the interesting question now is how to figure out what a and b are.
- 00:28So what I showed you last time was that we have some prior knowledge.
- 00:33We are thinking about this hypothetical situation, we have some expert knowledge that tells us that the mean rate of regressions
- 00:40is 3 and the variance is 1.5.
- 00:42We also know that in the Gamma distribution the mean is a/b, where a and b are the parameters and the variance is
- 00:49(a/b)^2.
- 00:50So it's a simple algebraic problem that can be easily solved.
- 00:55So, I've got a/b
- 00:57That's the mean of the Gamma equal to 3, and (a/b)^2 is equal to 1.5.
- 01:04That's the variance that we know from prior research.
- 01:07So all you have to do now is for example, equate a = 3*b, plug it into this equation here.
- 01:14So I've got (3*b/b)^2 = 1.5.
- 01:19And hopefully you can see that that implies that b has to be 2.
- 01:23So once I know that b is 2, I can plug it in here and a is equal to 6.
- 01:29So that's it.
- 01:30I mean I know from I've done a principled analysis, you know to figure out what my prior parameters are gonna be for the
- 01:38Gamma distribution.
- 01:39This is for the λ parameter that I'm trying to work out.
- 01:44So if you maybe want to do this yourself once, maybe it's a good idea to try that out.
- 01:51Alright, so this is what my prior is now based on my prior knowledge.
- 01:55So this is what the prior looks like.
- 01:56I've got a and b parameters specified and this is going to be my prior on the λ parameter in the Poisson
- 02:04likelihood.
- 02:05Okay, so I'm simply going to multiply now.
- 02:07Okay, the rest of the steps are going to be just simple multiplication and addition.
- 02:13Okay, so what am I doing now?
- 02:15I'm going to figure out the posterior distribution of the λ parameter in the Poisson likelihood.
- 02:22And I will do that by multiplying the likelihood and the prior up to proportionality.
- 02:27Why up to proportionality?
- 02:28Because I'm ignoring the normalizing constants.
- 02:31Okay, Alright.
- 02:33So how does this work?
- 02:34So let's assume that we have n data points?
- 02:37So these are n independent data points which I'm representing as a vector.
- 02:41So that's what those angle brackets are supposed to mean.
- 02:43And so I'm writing X as a vector of values.
- 02:48That's why it's in bold face in math boldface.
- 02:50Okay, so I've got n independent data points.
- 02:53So these are the regressive eye movements.
- 02:55Okay, so what I'm now doing is I'm figuring out the likelihood for all those n data points and independent data points
- 03:05using the Poisson distribution.
- 03:10The probability mass function for the Poisson distribution.
- 03:13If you recall the formula for the Poisson distribution was this thing here where x is the data
- 03:21So x is the data here.
- 03:23So all I have to do now to figure out what the probability of this vector of data points is to take each of
- 03:34those data points x1 through xn
- 03:36and plug it into the Poisson likelihood.
- 03:41And multiply out each of those values with each other.
- 03:46Why can I multiply this out?
- 03:47Because they're independent values.
- 03:49So, I've got n independent data points.
- 03:51And for each of the data points, I can figure out the probability of each data point.
- 03:57So I'm just calculating the joint probability of all these data points.
- 04:01And so that's what I've done here.
- 04:02That's it.
- 04:03Now, this this notation is very cumbersome because I have to write multiplication and the dots and multiplication.
- 04:09So what I did is I just wrote it out more compactly instead of the summation notation, I have a
- 04:16symbol here, which just says the same thing as here, that for the n data points I'm just plugging in the i data point
- 04:23here, and I'm just multiplying those guys out.
- 04:26That's pretty much it.
- 04:27And so one thing to notice here is that this multiplication is pretty easy.
- 04:31Because I've got here in this term here, I've got (λ)^x1, then I've got in the next term, which
- 04:39I haven't shown here, I would have (λ)^x2, then the next term would be (λ)^x3
- 04:45All the way to (λ)^xn
- 04:47Now, what happens if I multiply these n terms out?
- 04:50I will get λ to the power of the sum of all those X because their exponents,
- 04:59I'm just adding them up.
- 05:00So that's why it's so simple.
- 05:02That's how I end up with λ sum of X here in this calculation here.
- 05:07So what I'm doing right now, by the way, don't forget what we're actually doing.
- 05:11We are trying to compute the likelihood of the data.
- 05:15You're figuring out the likelihood.
- 05:18In terms of the parameter λ
- 05:21Given all the data.
- 05:22So that's how I end up with λ to the power of sum of X.
- 05:26Similarly the exponential term has λ has exponential minus λ once for the first data point for the second data
- 05:34point, I will again get exponential minus λ and then all the way up to end data points.
- 05:39So that means I have any instances of exponent to the exponent minus λ.
- 05:44So what is that going to amount to that's going to amount to exponent minus n times λ because I have n data points.
- 05:51So that's how I end up with this final term here.
- 05:54And at the denominator of course I just have the products of all the x!
- 06:01So that's how I end up with the likelihood of multiple data points.
- 06:07This is the closed form solution for the likelihood in terms of λ, the parameter λ and the Poisson for the data
- 06:14The likelihood that I have from the Poisson distribution.
- 06:17Okay, so I've worked out what for multiple data points my likelihood is gonna look like why do I need it?
- 06:23Because I'm going to plug it in here and multiply it with the prior on λ.
- 06:27Okay, so let's take a look at how we do that now.
- 06:31This is going to be surprisingly easy because here I have my likelihood.
- 06:35This is just what I computed a few seconds ago, exponent minus nλ times λ sum of X.
- 06:42This is the denominator here and here I've got the Gamma distribution specified in terms of the parameters a and b.
- 06:50I have not specified what they are right now.
- 06:52But I would fix them if I was actually doing the analysis and I will show you where this is going now.
- 06:57Okay.
- 06:58So one thing to notice in this complicated equation is that it has two parts, one is the likelihood, one is the prior.
- 07:05Okay, that's pretty straightforward.
- 07:07I just plugged in these distribution, you know the terms of the distribution here.
- 07:12But one interesting thing to notice here is that several of these terms here do not involve λ at all.
- 07:19So, remember I'm trying to get the posterior distribution of λ given the data.
- 07:24But this Gamma of a is going to be a constant number.
- 07:28This b to the power of a is going to be a constant number.
- 07:33This product of the x! is gonna be a constant number.
- 07:37These are all the normalizing constant
- 07:40for the posterior, I can just drop them.
- 07:43So, we're going to just forget about all of those.
- 07:45And so get the posterior up to proportionality by only concentrating on this term, this is the kernel of the likelihood and I'm
- 07:55concentrating on this term here, which is the kernel of the Gamma distribution.
- 07:59I multiply them out and you will see that it works out so simply.
- 08:04So here is my likelihood times prior.
- 08:07Just the kernels of the distributions.
- 08:10And again, what you notice is that I've got one term exponent minus nλ
- 08:15That came from the likelihood term, but in the Gamma distribution, I've also got an exponent minus bλ.
- 08:22So I'm gonna have to sum up these two guys.
- 08:25And similarly I've got in the likelihood, I've got λ to the power of sum of X.
- 08:31And in the Gamma I've got λ the power of (a-1).
- 08:34Now if I add up the exponents in the λ terms, what do I get?
- 08:40Let's look over here.
- 08:41I get (λ)^(a-1 + Σ(X))
- 08:45So I'm just adding up the exponents in the λ term.
- 08:47Similarly, I'm adding up the exponents in the exponential term.
- 08:52And so what you see is that we started out with this horribly complicated equation but we dropped the normalizing constants
- 08:59and we end up with the posterior up to proportionality by simply doing addition on the exponents.
- 09:07So what we end up with though is a form for a posterior distribution for λ, that has the same form as the Gamma
- 09:16distribution.
- 09:18And this has the same form as the Gamma distribution.
- 09:22So notice that the Gamma distribution, the kernel of the Gamma distribution always has this form.
- 09:29So for some parameter it's going to be (a-1)
- 09:38So this should have been λ.
- 09:40I'm sorry, I should have written λ here.
- 09:42That should that should have been corrected and read this as (λ)^(a-1) and read this
- 09:53as exponent minus λ * b.
- 09:56Okay, so this form is exactly the form that I'm getting here in this posterior here in terms of the λ variable.
- 10:03So all I have to do now is that I just have to represent this posterior this kernel in terms of the Gamma distribution.
- 10:13So how do I do that?
- 10:15If you look if you just concentrate on this term here, this is giving me an updated a parameter in the posterior.
- 10:23Because the a parameter needs to look like (λ)^(a-1)
- 10:26So, what I'm seeing here is sum of X plus (a-1)
- 10:31So, the new parameter for my posterior distribution will be, well, you can call it a*, That would be a plus sum of
- 10:38X.
- 10:39And similarly, I can I will get an updated b parameter here, which will be whatever the update will be.
- 10:46And then I'll just show you in a second.
- 10:48Okay, so let's look here.
- 10:51So the updated a* parameter for the posterior distribution will be (a + sum of X)
- 10:57And b* is going to end up being (b + n)
- 11:02So what happens now is that I can write the posterior distribution in terms of a* and b*. Again, this should
- 11:10be λ is not θ.
- 11:11I sometimes use λ and sometimes θ.
- 11:14So that's how I ended up confusing this here.
- 11:17So, but the important point here is pretty clear.
- 11:20Okay, so the point is that the posterior distribution of λ is going to have updated a and b parameters and what are
- 11:27those parameters.
- 11:28The a parameter will be updated to a* which will be (a plus sum of X)
- 11:33a was the prior, the a parameter in the prior that I had used and the updated b parameter will be b*, which will
- 11:41be the parameter b I used in the prior plus the number of data points that I have.
- 11:46So the amazing thing now is that all I need for computing the posterior is sum up the X values that I have and I've
- 11:55observed the data, I sum up all of my X values, I calculate the number of data points.
- 12:01I know what my prior a and b are for the λ parameter and I end up with the posterior parameters on the Gamma distribution
- 12:08So it's really that straightforward and I get the posterior up to proportionality.
- 12:14But as I mentioned earlier, I can always figure that out this.
- 12:18The constant I can figure out later on.
- 12:20So the posterior can now be worked out as I showed you.
- 12:26It's this is also an example of a conjugate case.
- 12:28So, I'm going to quickly show you an example of how this works.
- 12:31How you would do this suppose this was my actual data.
- 12:33I have five data points.
- 12:35So I sum up these data points, I get 16 as my sum, n is equal to 5.
- 12:40So I don't actually have to do any computation.
- 12:43I just have to do simple addition here.
- 12:45I know what my a and b are.
- 12:47So I just do the calculation and this is my posterior for the λ parameter given my prior for a
- 12:53and b.
- 12:53Whatever that was.
- 12:54And I can always figure out the posterior now.
- 12:56So this is a great example for understanding how conjugacy gives you very clean posterior distributions that have belonged
- 13:05to the same family as the prior distribution.
- 13:08And if you want you can figure out the mean of the posterior by computing a* divided by b* etcetera.
- 13:15That's something you could report, you know in an analysis that would be your updated mean and variance from the data that
- 13:21you have.
- 13:22So your information has changed given the new data.
- 13:25Okay.
- 13:26So we saw some examples of conjugated analysis, the binomial data and the Poisson-Gamma.
- 13:31And in each case I showed you how to derive the posterior by hand using the likelihood and prior.
- 13:39And so the next thing that I'm going to show you something very, very important, fundamental idea in Bayesian data analysis
- 13:45that the posterior mean is going to be a compromise between the maximum likelihood estimate and the prior mean. That's coming
- 13:53up next
To enable the transcript, please select a language in the video player settings menu.