2.4 Bayes' rule in action (Poisson-Gamma conjugate case

This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?

2.4 Bayes' rule in action (Poisson-Gamma conjugate case - contd.)

Time effort: approx. 14 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:01So what we've been looking at in the previous lecture is the Poisson-Gamma conjugate case.
00:06And I showed you that we have to somehow figure out what the prior is on the λ parameter for the Poisson likelihood
00:14and we know that the Gamma distribution is a reasonable probability density function we can use as a prior for the λ
00:22parameter.
00:23But the interesting question now is how to figure out what a and b are.
00:28So what I showed you last time was that we have some prior knowledge.
00:33We are thinking about this hypothetical situation, we have some expert knowledge that tells us that the mean rate of regressions
00:40is 3 and the variance is 1.5.
00:42We also know that in the Gamma distribution the mean is a/b, where a and b are the parameters and the variance is
00:49(a/b)^2.
00:50So it's a simple algebraic problem that can be easily solved.
00:55So, I've got a/b
00:57That's the mean of the Gamma equal to 3, and (a/b)^2 is equal to 1.5.
01:04That's the variance that we know from prior research.
01:07So all you have to do now is for example, equate a = 3*b, plug it into this equation here.
01:14So I've got (3*b/b)^2 = 1.5.
01:19And hopefully you can see that that implies that b has to be 2.
01:23So once I know that b is 2, I can plug it in here and a is equal to 6.
01:29So that's it.
01:30I mean I know from I've done a principled analysis, you know to figure out what my prior parameters are gonna be for the
01:38Gamma distribution.
01:39This is for the λ parameter that I'm trying to work out.
01:44So if you maybe want to do this yourself once, maybe it's a good idea to try that out.
01:51Alright, so this is what my prior is now based on my prior knowledge.
01:55So this is what the prior looks like.
01:56I've got a and b parameters specified and this is going to be my prior on the λ parameter in the Poisson
02:04likelihood.
02:05Okay, so I'm simply going to multiply now.
02:07Okay, the rest of the steps are going to be just simple multiplication and addition.
02:13Okay, so what am I doing now?
02:15I'm going to figure out the posterior distribution of the λ parameter in the Poisson likelihood.
02:22And I will do that by multiplying the likelihood and the prior up to proportionality.
02:27Why up to proportionality?
02:28Because I'm ignoring the normalizing constants.
02:31Okay, Alright.
02:33So how does this work?
02:34So let's assume that we have n data points?
02:37So these are n independent data points which I'm representing as a vector.
02:41So that's what those angle brackets are supposed to mean.
02:43And so I'm writing X as a vector of values.
02:48That's why it's in bold face in math boldface.
02:50Okay, so I've got n independent data points.
02:53So these are the regressive eye movements.
02:55Okay, so what I'm now doing is I'm figuring out the likelihood for all those n data points and independent data points
03:05using the Poisson distribution.
03:10The probability mass function for the Poisson distribution.
03:13If you recall the formula for the Poisson distribution was this thing here where x is the data
03:21So x is the data here.
03:23So all I have to do now to figure out what the probability of this vector of data points is to take each of
03:34those data points x1 through xn
03:36and plug it into the Poisson likelihood.
03:41And multiply out each of those values with each other.
03:46Why can I multiply this out?
03:47Because they're independent values.
03:49So, I've got n independent data points.
03:51And for each of the data points, I can figure out the probability of each data point.
03:57So I'm just calculating the joint probability of all these data points.
04:01And so that's what I've done here.
04:02That's it.
04:03Now, this this notation is very cumbersome because I have to write multiplication and the dots and multiplication.
04:09So what I did is I just wrote it out more compactly instead of the summation notation, I have a
04:16symbol here, which just says the same thing as here, that for the n data points I'm just plugging in the i data point
04:23here, and I'm just multiplying those guys out.
04:26That's pretty much it.
04:27And so one thing to notice here is that this multiplication is pretty easy.
04:31Because I've got here in this term here, I've got (λ)^x1, then I've got in the next term, which
04:39I haven't shown here, I would have (λ)^x2, then the next term would be (λ)^x3
04:45All the way to (λ)^xn
04:47Now, what happens if I multiply these n terms out?
04:50I will get λ to the power of the sum of all those X because their exponents,
04:59I'm just adding them up.
05:00So that's why it's so simple.
05:02That's how I end up with λ sum of X here in this calculation here.
05:07So what I'm doing right now, by the way, don't forget what we're actually doing.
05:11We are trying to compute the likelihood of the data.
05:15You're figuring out the likelihood.
05:18In terms of the parameter λ
05:21Given all the data.
05:22So that's how I end up with λ to the power of sum of X.
05:26Similarly the exponential term has λ has exponential minus λ once for the first data point for the second data
05:34point, I will again get exponential minus λ and then all the way up to end data points.
05:39So that means I have any instances of exponent to the exponent minus λ.
05:44So what is that going to amount to that's going to amount to exponent minus n times λ because I have n data points.
05:51So that's how I end up with this final term here.
05:54And at the denominator of course I just have the products of all the x!
06:01So that's how I end up with the likelihood of multiple data points.
06:07This is the closed form solution for the likelihood in terms of λ, the parameter λ and the Poisson for the data
06:14The likelihood that I have from the Poisson distribution.
06:17Okay, so I've worked out what for multiple data points my likelihood is gonna look like why do I need it?
06:23Because I'm going to plug it in here and multiply it with the prior on λ.
06:27Okay, so let's take a look at how we do that now.
06:31This is going to be surprisingly easy because here I have my likelihood.
06:35This is just what I computed a few seconds ago, exponent minus nλ times λ sum of X.
06:42This is the denominator here and here I've got the Gamma distribution specified in terms of the parameters a and b.
06:50I have not specified what they are right now.
06:52But I would fix them if I was actually doing the analysis and I will show you where this is going now.
06:57Okay.
06:58So one thing to notice in this complicated equation is that it has two parts, one is the likelihood, one is the prior.
07:05Okay, that's pretty straightforward.
07:07I just plugged in these distribution, you know the terms of the distribution here.
07:12But one interesting thing to notice here is that several of these terms here do not involve λ at all.
07:19So, remember I'm trying to get the posterior distribution of λ given the data.
07:24But this Gamma of a is going to be a constant number.
07:28This b to the power of a is going to be a constant number.
07:33This product of the x! is gonna be a constant number.
07:37These are all the normalizing constant
07:40for the posterior, I can just drop them.
07:43So, we're going to just forget about all of those.
07:45And so get the posterior up to proportionality by only concentrating on this term, this is the kernel of the likelihood and I'm
07:55concentrating on this term here, which is the kernel of the Gamma distribution.
07:59I multiply them out and you will see that it works out so simply.
08:04So here is my likelihood times prior.
08:07Just the kernels of the distributions.
08:10And again, what you notice is that I've got one term exponent minus nλ
08:15That came from the likelihood term, but in the Gamma distribution, I've also got an exponent minus bλ.
08:22So I'm gonna have to sum up these two guys.
08:25And similarly I've got in the likelihood, I've got λ to the power of sum of X.
08:31And in the Gamma I've got λ the power of (a-1).
08:34Now if I add up the exponents in the λ terms, what do I get?
08:40Let's look over here.
08:41I get (λ)^(a-1 + Σ(X))
08:45So I'm just adding up the exponents in the λ term.
08:47Similarly, I'm adding up the exponents in the exponential term.
08:52And so what you see is that we started out with this horribly complicated equation but we dropped the normalizing constants
08:59and we end up with the posterior up to proportionality by simply doing addition on the exponents.
09:07So what we end up with though is a form for a posterior distribution for λ, that has the same form as the Gamma
09:16distribution.
09:18And this has the same form as the Gamma distribution.
09:22So notice that the Gamma distribution, the kernel of the Gamma distribution always has this form.
09:29So for some parameter it's going to be (a-1)
09:38So this should have been λ.
09:40I'm sorry, I should have written λ here.
09:42That should that should have been corrected and read this as (λ)^(a-1) and read this
09:53as exponent minus λ * b.
09:56Okay, so this form is exactly the form that I'm getting here in this posterior here in terms of the λ variable.
10:03So all I have to do now is that I just have to represent this posterior this kernel in terms of the Gamma distribution.
10:13So how do I do that?
10:15If you look if you just concentrate on this term here, this is giving me an updated a parameter in the posterior.
10:23Because the a parameter needs to look like (λ)^(a-1)
10:26So, what I'm seeing here is sum of X plus (a-1)
10:31So, the new parameter for my posterior distribution will be, well, you can call it a*, That would be a plus sum of
10:38X.
10:39And similarly, I can I will get an updated b parameter here, which will be whatever the update will be.
10:46And then I'll just show you in a second.
10:48Okay, so let's look here.
10:51So the updated a* parameter for the posterior distribution will be (a + sum of X)
10:57And b* is going to end up being (b + n)
11:02So what happens now is that I can write the posterior distribution in terms of a* and b*. Again, this should
11:10be λ is not θ.
11:11I sometimes use λ and sometimes θ.
11:14So that's how I ended up confusing this here.
11:17So, but the important point here is pretty clear.
11:20Okay, so the point is that the posterior distribution of λ is going to have updated a and b parameters and what are
11:27those parameters.
11:28The a parameter will be updated to a* which will be (a plus sum of X)
11:33a was the prior, the a parameter in the prior that I had used and the updated b parameter will be b*, which will
11:41be the parameter b I used in the prior plus the number of data points that I have.
11:46So the amazing thing now is that all I need for computing the posterior is sum up the X values that I have and I've
11:55observed the data, I sum up all of my X values, I calculate the number of data points.
12:01I know what my prior a and b are for the λ parameter and I end up with the posterior parameters on the Gamma distribution
12:08So it's really that straightforward and I get the posterior up to proportionality.
12:14But as I mentioned earlier, I can always figure that out this.
12:18The constant I can figure out later on.
12:20So the posterior can now be worked out as I showed you.
12:26It's this is also an example of a conjugate case.
12:28So, I'm going to quickly show you an example of how this works.
12:31How you would do this suppose this was my actual data.
12:33I have five data points.
12:35So I sum up these data points, I get 16 as my sum, n is equal to 5.
12:40So I don't actually have to do any computation.
12:43I just have to do simple addition here.
12:45I know what my a and b are.
12:47So I just do the calculation and this is my posterior for the λ parameter given my prior for a
12:53and b.
12:53Whatever that was.
12:54And I can always figure out the posterior now.
12:56So this is a great example for understanding how conjugacy gives you very clean posterior distributions that have belonged
13:05to the same family as the prior distribution.
13:08And if you want you can figure out the mean of the posterior by computing a* divided by b* etcetera.
13:15That's something you could report, you know in an analysis that would be your updated mean and variance from the data that
13:21you have.
13:22So your information has changed given the new data.
13:25Okay.
13:26So we saw some examples of conjugated analysis, the binomial data and the Poisson-Gamma.
13:31And in each case I showed you how to derive the posterior by hand using the likelihood and prior.
13:39And so the next thing that I'm going to show you something very, very important, fundamental idea in Bayesian data analysis
13:45that the posterior mean is going to be a compromise between the maximum likelihood estimate and the prior mean. That's coming
13:53up next