Este vídeo pertenece al curso Introduction to Bayesian Data Analysis de openHPI. ¿Quiere ver más?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00We've been talking about logistic regression so far.
- 00:03And so now that we understand how the model is built, we're going to first specify some priors, investigate the implications
- 00:12of those priors and then proceed to fit the model.
- 00:16These are the next steps that we have to take just to remind you what the dataset looks like.
- 00:19There's the recall data we have for multiple subjects.
- 00:24We have for different set sizes.
- 00:25We have correct or incorrect responses in the experiment that they did in which they had to recall a word in a particular
- 00:32role of words.
- 00:33So the answer that they are going to give is either correct or incorrect.
- 00:38And so what we did is that for the different set sizes first we centered the set size as I explained earlier and we're gonna
- 00:44use that centered set size as our predictors.
- 00:47So these values the center values can be negative or positive and they're the mean of this vector of centered, set size will
- 00:54be zero and that represents the mean set size.
- 01:00So what does the model look like?
- 01:02We are going to model the probabilities in the Bernoulli likelihood.
- 01:06And we are not fitting the Model 201 responses even though the software that you use may actually do that.
- 01:13But internally the model is doing something different.
- 01:17And so what we're going to do is we're going to model the effect of set size sets eyes on the log odds rather than
- 01:25on probability.
- 01:26Okay.
- 01:28Alright.
- 01:29So how do we proceed?
- 01:32First of all notice that in this model that I just showed you there is no residue a letter term like in the linear model
- 01:39there's no error term.
- 01:40Now, that's because of the way that estimation is done in the in the classical approach to generalized linear models.
- 01:47But we don't it's not really interesting for us where this comes from.
- 01:52What's interesting for us is that once we have estimated the parameters alpha and beta, we can always work out the probabilities
- 02:03for every possible set size that we are interested in that we had in the experiment.
- 02:09And that was this equation that I derived for you a few minutes ago.
- 02:12in the previous lecture.
- 02:16So
- 02:18to summarize the basic model assumes the Bernoulli likelihood generating the 01 responses.
- 02:24The model is fit on the log odds scale.
- 02:26So that's why is now the log odds here.
- 02:30The eta is the log odds here and we've got as before.
- 02:34What has not changed is that we have an intercept and we have a slope.
- 02:39So we're going to be looking at prior distributions for these and then we're going to have to look at the posterior and posterior
- 02:47predictive distribution and so on.
- 02:49And we can always convert back these data estimates on on the log odds scale, you can convert them back to the probability
- 02:56scale as I showed you earlier and of course I'm going to do that now.
- 03:00Alright, so just as a piece of information for you, there are two useful functions in r as you know.
- 03:06and for every distribution, there's the dpqr family for the logistic function, there's also this family.
- 03:12So the q largest function gives you the logic that I showed earlier and the p largest function gives you the inverse logic.
- 03:20Of course you could write this out yourself, but these are available to you.
- 03:25So let's think about on the log odds scale.
- 03:30We want to think about what a reasonable prior for alpha would be for the intercept parameter.
- 03:36So let's start with a wild guess.
- 03:38Normal with mean zero and standard deviation four.
- 03:41Now, a priori I wouldn't know what this actually means on the probability scale.
- 03:47What is the log odds,
- 03:49I don't use it in day to day life.
- 03:51So I can't really say but I can plot the prior predictive distribution to see what this implies.
- 03:58So if I look at the alpha parameter which is defined in the log all space and I convert it back to probability
- 04:07space using the formula I showed you earlier.
- 04:11I get back a little bit of a surprise.
- 04:14The surprise is that this normal 04 prior actually implies that my prior expectation, is that the probability parameter right
- 04:26is going to be either close to zero or close to one with low likelihood for any of the other values.
- 04:34This is not a very sensible looking prior for any application that I can think of.
- 04:40Especially not the one that we are discussing right now.
- 04:43So we're talking about accuracy as a function of set size.
- 04:47And so I would not expect the accuracy to just go from be either zero or one.
- 04:51For example, I would expect it to be in the mid ranges or something.
- 04:55So this is not a very great prior for the problem that we're studying.
- 05:02So what could be an alternative prior that we can use what I'm showing you is how I reason or how we reason in based
- 05:07on statistics about priors, you always do this biographic plotting out the implications of your prior in the scale that you're
- 05:16interested in being milliseconds or probability scale or whatever.
- 05:21And then try to interpret that in the context of your research problems.
- 05:25Presumably you're an expert in your field and your domain and you know what a reasonable range of values are gonna be.
- 05:31So you use that information to decide what a reasonable prior.
- 05:36So let's start again with a different prior.
- 05:39Okay, so let's use the prior
- 05:40that is more constrained
- 05:41mean zero.
- 05:42Standard version 1.5.
- 05:44And then I'm gonna back transform it using that formula that I derived back to probability scale and this time things look
- 05:50much better.
- 05:51So this looks like a pretty uninformative
- 05:54prior this alpha, normal zeroes 10 revision 1.5 in log
- 05:58odd scale is a pretty reasonable prior on the probability scale because it allows pretty much all possible values.
- 06:04But importantly, it down weights the extreme values plus plus one and zero.
- 06:11These values are down weighted slightly.
- 06:15So perhaps I could downgrade them even more because it's highly implausible that I would get accuracies of 0 to 25 or maybe
- 06:2395, I could still imagine.
- 06:25But beyond that, probably not not for the large set size.
- 06:28So I mean if I was continuing to work on this problem, probably I would constrain the prior to be even more tighter so that
- 06:35it flattens out much more towards the edges.
- 06:38So this kind of flattening out that one can do with the prior is called regularization.
- 06:43And this is one of the most powerful tools that bayesian methods provide you in the prior specification, you can modulate the
- 06:52prior such that you can specify the prior.
- 06:54So that the totally implausible values given your domain problem that you're working on are basically ruled out a priori.
- 07:02That's a very sensible thing to do.
- 07:04And it has huge implications when you start feeling complicated models with hundreds of parameters where you don't enough
- 07:11data to get good estimates from the data on the for the parameters posteriors.
- 07:17But you do have regularization on the prior so that you can modulate, make sure that the posterior still look reasonable
- 07:24This process is called regularization.
- 07:27So I haven't really regularized this prior yet, but it's still better than this one here.
- 07:33This one looks like a pretty crazy prior to use, although you could still use it and nothing bad would happen.
- 07:38So you can try it out and see what happens,
- 07:41and the reason nothing bad would happen is that the data will overwhelm the posterior because there's so much data, there's
- 07:46gonna be no very little influence of these kinds of vague priors on the posterior distribution for the parameter.
- 07:56And so as I said, you can go even further.
- 07:58Okay, so now you can start putting in more and more informative prior to see what the implications are.
- 08:04And but as I said before, the really interesting thing for us is not the alpha parameter, which is the average accuracy,
- 08:12but rather the slope parameter which tells us the effect of such size on accuracy.
- 08:17So that's where all my attention is focused at the moment.
- 08:21And so what I would do is when I'm actually working on a problem like this, I would work with increasingly informative priors
- 08:28and investigate their prior predictive consequences in the data.
- 08:33So one way to do this, the code is all in the text.
- 08:35So you can look at it later.
- 08:36I don't want to distract you with the details of the code right now.
- 08:40But all I have done is that for the different priors for beta and different set sizes.
- 08:46These I'm plugging into the model, you know, into the prior predictive distribution to the model that produces the prior
- 08:51predictive distribution.
- 08:53I'm getting back for each of the priors and each of the set sizes.
- 08:58I'm getting a range of predicted, you know, accuracy.
- 09:02These are the prior predictive accuracy.
- 09:04That's why they're so spread out because they're agnostic almost all of these except this one here, which is very weird and
- 09:12this one that is very weird because it's got this high waiting for zero and one values.
- 09:17That's not a great situation to set up for prior because you're already biasing, you know, your prior to go to 01 of course
- 09:26as I said before in these data, in these data, the posterior is going to be dominated by the likelihood.
- 09:31So it won't really matter even if you use these.
- 09:34But the other priors that I've got here, they seem to be more reasonable because they're agnostic, which is good because
- 09:41they let the data tell us a bit more about what what it has to tell us and at the same time they produce reasonable
- 09:51flat uniform distributions.
- 09:56It's so this is this is just showing the distribution of the data a apriori through the prior distributions on beta and so
- 10:03on.
- 10:04But you can also look at the predicted differences in accuracy between set sizes.
- 10:09So the difference between 4 and 2, 6 and 4, 8 and 6.
- 10:14So these are possible, you know, stepwise differences that you might want to examine for prior specifications on the beta
- 10:21parameter.
- 10:23So this is extremely useful because now, you know, under each of these prior specifications, what the prior distribution
- 10:31on the differences will look like between these two set sizes.
- 10:37These sets of set sizes.
- 10:39This is very useful because now I know as a scientist if I'm working on Recall accuracy for the last 20 years and that's
- 10:46what usually happens to people right there.
- 10:47Working on one kind of problem for many years, they have a pretty good idea or they should have a pretty good idea of what
- 10:54the reasonable range of variation is going to be from one set size to the other.
- 10:59Okay, so this is this is called a sensitivity analysis.
- 11:04I've shown you many examples of this before but I just wanted to show you that you can systematically set up your workflow
- 11:09even before you've collected your data, you know what the implications are of each of your prize and nothing stop you from
- 11:16fitting every single model with all the priors that you have one by one to see what the posterior distributions look like.
- 11:24Okay, so that will tell you something about how much information you're getting from the prior relative to the information
- 11:30you're getting from the data.
- 11:32Okay, so for now I would say that these priors that I chose for the alpha parameter.
- 11:39You know that flat fryer in the probability scale and the beta parameter normal prior with mean zero and 10 division 100.1
- 11:48So that would be this guy here.
- 11:50This seems like a reasonable prior to choose, you know, so to look at the to fit the model.
- 11:57I could have chosen a vague more vague prior or even these prizes and they would still have been okay, as I mentioned earlier.
- 12:05So the next thing that we will do is we will fit the model and then as usual examine the posterior distribution of the parameters
- 12:13and try to draw inferences from it.
To enable the transcript, please select a language in the video player settings menu.