Este vídeo pertenece al curso Introduction to Bayesian Data Analysis de openHPI. ¿Quiere ver más?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00What we've done so far is that we've looked at probability distributions, we looked at a little bit of random variable theory
- 00:08and we've got a good sense of what you can do with the distribution, what kind of questions you can ask from a distribution.
- 00:14And I exemplified that with the dpqr functions for several examples that I showed you.
- 00:20And then what I did was, I showed you an example of analytical base.
- 00:24That means doing the analysis on paper.
- 00:28Without any computer, you can derive the posterior distribution of the means, posterior distribution of the parameter,
- 00:34I mean.
- 00:35So what's interesting for us however, is that these analytical examples are very good for developing intuitions about Bayes'.
- 00:43But what we really need to do in real life when we have large amounts of data are very complex, complex data with a complex
- 00:50structure?
- 00:51We need computational tools because we cannot do the analytical analysis any more.
- 00:57We'll have to do this computationally in real life.
- 01:00So that's what the rest of this course is about.
- 01:03Okay, so I will start talking about this now.
- 01:05And so, just to remind you, we started off with Bayes' rule.
- 01:10In the discrete case when there are discrete events,
- 01:12I had Bayes' rule written down as an equation one and then I showed you Bayes' rule written down when you're talking about probability
- 01:21distributions, so probability density functions.
- 01:23And I gave you an example or a couple of examples where we had a single parameter to work with, you know, like in the
- 01:29beta binomial, we had the theta parameter; in the Poisson-Gamma, we had the lambda parameter.
- 01:36So life was easy.
- 01:37And we could do these analysis on paper.
- 01:40But what will happen as I mentioned earlier in real life is that we will have dozens, maybe hundreds of parameters.
- 01:46So theta is no longer a single parameter.
- 01:50It's a vector of parameters.
- 01:51So that's why I'm writing it in boldface theta.
- 01:53So that will be the normal situation.
- 01:55And in that situation the problem is going to be that we will no longer be able to calculate a posterior distribution for
- 02:05a single parameter because there isn't a single parameter.
- 02:08There are many parameters.
- 02:09So we're gonna get the joint distribution now, we're talking about multivariate distributions.
- 02:14We're gonna get a joint distribution for the parameters when we look at the posterior distribution of this bold faced theta
- 02:22here.
- 02:22Okay.
- 02:22So that's where we're going now.
- 02:24And our central focus, you know, in data analysis is going to be trying to interpret the posterior distributions of each
- 02:32of the parameters.
- 02:33That's going to be where all the action is going to be.
- 02:36Okay.
- 02:36So I'm going to explain all this with some examples but before I do that, I want to quickly remind you about what we have
- 02:43done so far and what we have achieved with the Poisson-Gamma conjugate case.
- 02:48What happened there?
- 02:49We had a likelihood function defined for the data that we're getting in discrete regressive accounts of regressive eye movements
- 02:57in eye tracking data.
- 02:58And we chose the Poisson likelihood for that.
- 03:01So that's the likelihood shown here.
- 03:03And we chose a Gamma prior for the lambda parameter in the Poisson likelihood.
- 03:08And we chose some values for A
- 03:11and B.
- 03:12And so what we did, what I actually did last time was that in the last lecture was that I simply multiplied these two kernels.
- 03:21So these are the full probability density functions for the the likelihood and for the prior.
- 03:27But what I pointed out last time was that some of these terms like this denominator here, this B to the power of A and
- 03:38Gamma A.
- 03:39These are all going to be constants because these are all fixed numbers.
- 03:43So we can remove these from the picture because they end up being the normalizing constants.
- 03:47So really what we're interested in is the posterior distribution of lambda up to proportionality.
- 03:53And the way we're going to do that is by taking only those terms that involve lambda because lambda is the variable here
- 04:00that we're going to look at.
- 04:01So what I showed you last time, was that all I literally have to do is to multiply this term with the kernel of this
- 04:09prior here.
- 04:11So what would that look like?
- 04:12I wrote it up quickly on my blackboard.
- 04:17And so if you notice what's going on here, is that this term looks very complicated.
- 04:23But it's actually not because you've got this lambda term here and you've got another lambda term here.
- 04:28So what is lambda sum of X multiplied with lambda a minus one.
- 04:37That's an easy addition.
- 04:38Because these are just exponents, I'm going to just say, sum of x plus a minus one.
- 04:45Gives me the result of that calculation.
- 04:47And what about these exponential terms here?
- 04:50These are also easy because I've got exponent minus n times lambda, that's the first one here, in the likelihood.
- 04:59And then I've got this term here. That's in the prior, which is the exponent minus b
- 05:06lambda.
- 05:07And so how would I rewrite that?
- 05:09I just again have to, because these exponents, I just have to add them up.
- 05:13So I get exponent minus and lambda minus B
- 05:18lambda.
- 05:19And so I could simplify this even further by saying exponent minus lambda and plus B.
- 05:27So that's how I got, got to the point, by doing these simple additions on the exponents, that's how I got to
- 05:34the point that I simplified the posterior distribution up to proportionality with this term.
- 05:40And what what's interesting here, you know the reason that it's called a Poisson-Gamma conjugate case is that the prior has
- 05:47the form of the Gamma distribution.
- 05:50So the kernel is obviously belongs to the Gamma distribution over here, but interestingly the posterior also ends up having
- 05:58the same form as a Gamma distribution.
- 06:01So what I'm looking at here is the kernel of a new Gamma distribution with updated A and B parameters.
- 06:07So what are those updated parameters?
- 06:09If I just look at this.
- 06:11You can see that this looks exactly like a Gamma distribution with a new A parameter which is sum of x plus a.
- 06:19And the b parameter is b plus n.
- 06:21And that's how I say, that's how I came to the conclusion that my updated a and b parameters in the posterior for lambda
- 06:29are these terms here.
- 06:31This is what the story was up till now.
- 06:33And we did this all by hand, like we didn't have to use any computing tools for this.
- 06:37And so you can visualize this and it's always useful to come up with concrete examples to understand how this plays out
- 06:44in practice.
- 06:45So I showed you an example where we had a prior with an a and b parameters six and two on lambda and I got
- 06:52some data, independent data.
- 06:55And we computed the posterior last time and we got a posterior for lambda, that was 20, 7 with a and b being 20 and 7
- 07:02respectively.
- 07:03And so you can visualize these two prior. The prior and the posterior can be visualized quite easily.
- 07:10And this code will be of course available to you to play with later on.
- 07:13So you can see that the prior, which is in red here, is much more spread out.
- 07:19It's more to the left and once the data come in the posterior for the lambda parameter gets a bit tighter and it moves
- 07:26to the right a little bit.
- 07:27So that's the effect of the data.
- 07:29The data has updated our belief about this lambda parameter and the belief about the lambda parameters is expressed in terms
- 07:38of the probability density function.
- 07:40The PDF associated with lambda.
- 07:43So that's the whole big deal about the Bayesian approach.
- 07:47You start with some prior you get some data and this data updates your prior and gives you the posterior distribution.
- 07:54That's the key idea here.
- 07:55Now, once you know what the posterior is.
- 07:59So in this case it was Gamma 20, 7.
- 08:02So once you know what the posterior is, you can ask interesting questions about that distribution and that's why I showed
- 08:07you those dpqr functions because now you can use the qgamma function for the posterior distribution which shape
- 08:15and rate 20 and 7 respectively.
- 08:18So these are the a and b parameters and you can find out what is the range of values.
- 08:24Such that I'm 95% sure that the λ value lies within this range.
- 08:29So this is called a 95% credible interval discussed in great detail in the text book, but what this is giving you is
- 08:36one of the big deals about the Bayes' approaches that gives you uncertainty interval.
- 08:41So you can think about how unsure you are about this parameter after you've seen the data.
- 08:47So this is a very valuable piece of information.
- 08:50The uncertainty.
- 08:51And in fact, you will see in textbooks that Bayesian data analysis is characterized as uncertainty quantification.
- 08:58This is an example of that.
- 09:00We're quantifying the uncertainty about this parameter here.
- 09:03Okay.
- 09:03But what I want to show you here, is that what I'm doing right now,
- 09:07what I just did here is that I have an analytical form for the posterior on lambda and so I can now compute the quantiles
- 09:15et cetera.
- 09:16But it could I could easily have done the same thing.
- 09:19If I just had samples from the Gamma distribution with A equal to 20 and B equal to 7.
- 09:24If I just had a large number of samples, say, 4000 samples, I could still get the same credible interval approximately
- 09:33So let me show you how that works.
- 09:35So, suppose I had 4000 samples from a Gamma distribution. Here
- 09:39I'm using the rgamma function.
- 09:41Okay, So the dpqr family strikes again and here's my posterior specification of a and b.
- 09:47And what I get here is posterior samples of the lambda parameter and this is just a vector
- 09:54now, okay.
- 09:55Of samples coming from this.
- 09:57Random samples coming from this Gamma distribution with a particular parameterization.
- 10:02And so what I can now do is I can use the quantile function
- 10:06And figure out the 95% credible interval that I just computed analytically.
- 10:11This is the analytical analysis.
- 10:13This is analysis computing the same interval using samples from the posterior distribution.
- 10:21So the reason I'm showing you this is that in real life data analysis, we cannot do this analytical calculation and get a
- 10:28posterior distribution with a particular parameter.
- 10:30We don't know what the exact form of the distribution is.
- 10:33But what we can get through MCMC sampling. Is the samples from the posterior distribution.
- 10:40And we can always figure out, you know, the 95% credible interval or any other statistic
- 10:46from the posterior once we have these samples and our focus will always be on these samples which will be delivered to us
- 10:53by software.
- 10:54Okay, so we don't have to do any more analytical work.
- 10:57This is the good news.
- 10:58Okay.
- 11:00So when I say looking that we will look at the posterior samples from now on this is what I mean.
- 11:06We have some samples from the posterior distribution and we're gonna do some statistics on those posterior distributions
- 11:11which will be a vector.
- 11:13For each parameter.
- 11:14And we can draw inferences about that parameter from the samples.
- 11:20Okay, so that's the point here.
- 11:22Alright, So one little slide I have is that in lecture 2.3 on slide 8.
- 11:28I had accidentally said that the parameters a and b
- 11:31of the Gamma distribution correspond to the shape and scale.
- 11:35What I actually meant was shape and rate.
- 11:37So I have corrected that in the slides but I just wanted to remind you that there's several different parameterization
- 11:44of the Gamma distribution.
- 11:46One is in terms of scale and the other in terms of rate. The scale is one over rate.
- 11:51So you can rewrite the distribution in terms of one over lambda instead of lambda.
- 11:56So people do it differently depending on what their needs are for the Gamma distribution.
- 12:01So that's why there's multiple ways to write the Gamma distribution in R and in mathematics, but we are going to use the
- 12:07shape and rate parameters in the discussion that I'm doing about the Gamma.
- 12:13Alright, so that's just a small detail that you need to pay attention to.
- 12:17So what I want to talk about now is to I want to come back to the point that our main goal will always be
- 12:23obtaining the posterior distribution or posterior distributions of the parameters that we're interested in.
- 12:29Okay, so they could be just one parameter like in the toy examples I showed you or they could be literally hundreds of parameters
- 12:36We can still get the posterior. It scales up very beautifully and we'll be getting these posteriors using some sampling method
- 12:43to get the posteriors for each of the parameters and these are the MCMC methods, we don't need to know anything about
- 12:49the details of MCMC sampling in this course because the software takes care of it.
- 12:53But later on, if you want to get into the details and write your own samplers
- 12:57Sometimes when I have to write customized samplers in those cases you would have to learn a little bit but it's not really
- 13:03that complicated.
- 13:04The book that I mentioned by Lambert will help you there.
- 13:07Okay.
- 13:08Alright.
- 13:08So now let's look at a concrete example.
- 13:10You know, how would I do such a computational data analysis?
- 13:13Now?
- 13:13I'm not doing analytical work.
- 13:15Now, I'm using a tool, the BRMS package
- 13:19in R. So let's say I have data from a single subject whose only task is to sit at the computer and keep pressing
- 13:25the spacebar.
- 13:26Okay?
- 13:27They're just pressing the spacebar repeatedly and I'm only recording on the computer the amount of time they take before
- 13:33they press the spacebar and release it.
- 13:36So it takes 141 milliseconds in the first trial, then 138 ms and so on.
- 13:43So that's what this RT column contains, reading time or reaction times to this button pressing that we're doing.
- 13:50It's completely a mindless task.
- 13:52It's just a mindless button pressing task.
- 13:55Alright, so the responses are in milliseconds.
- 13:58Okay, so these are the responses we're getting in each trial and we would like to know how long it takes to press a key
- 14:04for this subject, let's say on average, and how much variability there is in this subject's key pressing.
- 14:10So, first of all, of course, you should always look at the data to see what you're going to model.
- 14:16This is the data that we're going to model.
- 14:17It's just a probability distribution.
- 14:20And you see that roughly
- 14:22it's about 180 milliseconds or something like that.
- 14:24And there's a long tail here.
- 14:26This is very interesting that there's a long tail, a few rare data points that are quite long, but most of them are in this
- 14:33range here.
- 14:33So this is what we're going to try to model now.
- 14:35Okay, so we're gonna start with a simple model where we're going to assume this is of course not a reasonable assumption
- 14:43but I'll fix this later on.
- 14:44We're going to assume that each of the n data points, n refers to
- 14:49each row in the data frame.
- 14:50Each of those
- 14:51n data points is coming from a normal distribution with some mean mu and some standard deviation sigma.
- 14:58So in the next lecture, I'm going to unpack this model for you in a Bayesian framework
To enable the transcript, please select a language in the video player settings menu.