4.6 Regression models

Este vídeo pertenece al curso Introduction to Bayesian Data Analysis de openHPI. ¿Quiere ver más?

4.6 Regression models

Duración: aproximada 13 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00We've been talking about logistic regression so far.
00:03And so now that we understand how the model is built, we're going to first specify some priors, investigate the implications
00:12of those priors and then proceed to fit the model.
00:16These are the next steps that we have to take just to remind you what the dataset looks like.
00:19There's the recall data we have for multiple subjects.
00:24We have for different set sizes.
00:25We have correct or incorrect responses in the experiment that they did in which they had to recall a word in a particular
00:32role of words.
00:33So the answer that they are going to give is either correct or incorrect.
00:38And so what we did is that for the different set sizes first we centered the set size as I explained earlier and we're gonna
00:44use that centered set size as our predictors.
00:47So these values the center values can be negative or positive and they're the mean of this vector of centered, set size will
00:54be zero and that represents the mean set size.
01:00So what does the model look like?
01:02We are going to model the probabilities in the Bernoulli likelihood.
01:06And we are not fitting the Model 201 responses even though the software that you use may actually do that.
01:13But internally the model is doing something different.
01:17And so what we're going to do is we're going to model the effect of set size sets eyes on the log odds rather than
01:25on probability.
01:26Okay.
01:28Alright.
01:29So how do we proceed?
01:32First of all notice that in this model that I just showed you there is no residue a letter term like in the linear model
01:39there's no error term.
01:40Now, that's because of the way that estimation is done in the in the classical approach to generalized linear models.
01:47But we don't it's not really interesting for us where this comes from.
01:52What's interesting for us is that once we have estimated the parameters alpha and beta, we can always work out the probabilities
02:03for every possible set size that we are interested in that we had in the experiment.
02:09And that was this equation that I derived for you a few minutes ago.
02:12in the previous lecture.
02:16So
02:18to summarize the basic model assumes the Bernoulli likelihood generating the 01 responses.
02:24The model is fit on the log odds scale.
02:26So that's why is now the log odds here.
02:30The eta is the log odds here and we've got as before.
02:34What has not changed is that we have an intercept and we have a slope.
02:39So we're going to be looking at prior distributions for these and then we're going to have to look at the posterior and posterior
02:47predictive distribution and so on.
02:49And we can always convert back these data estimates on on the log odds scale, you can convert them back to the probability
02:56scale as I showed you earlier and of course I'm going to do that now.
03:00Alright, so just as a piece of information for you, there are two useful functions in r as you know.
03:06and for every distribution, there's the dpqr family for the logistic function, there's also this family.
03:12So the q largest function gives you the logic that I showed earlier and the p largest function gives you the inverse logic.
03:20Of course you could write this out yourself, but these are available to you.
03:25So let's think about on the log odds scale.
03:30We want to think about what a reasonable prior for alpha would be for the intercept parameter.
03:36So let's start with a wild guess.
03:38Normal with mean zero and standard deviation four.
03:41Now, a priori I wouldn't know what this actually means on the probability scale.
03:47What is the log odds,
03:49I don't use it in day to day life.
03:51So I can't really say but I can plot the prior predictive distribution to see what this implies.
03:58So if I look at the alpha parameter which is defined in the log all space and I convert it back to probability
04:07space using the formula I showed you earlier.
04:11I get back a little bit of a surprise.
04:14The surprise is that this normal 04 prior actually implies that my prior expectation, is that the probability parameter right
04:26is going to be either close to zero or close to one with low likelihood for any of the other values.
04:34This is not a very sensible looking prior for any application that I can think of.
04:40Especially not the one that we are discussing right now.
04:43So we're talking about accuracy as a function of set size.
04:47And so I would not expect the accuracy to just go from be either zero or one.
04:51For example, I would expect it to be in the mid ranges or something.
04:55So this is not a very great prior for the problem that we're studying.
05:02So what could be an alternative prior that we can use what I'm showing you is how I reason or how we reason in based
05:07on statistics about priors, you always do this biographic plotting out the implications of your prior in the scale that you're
05:16interested in being milliseconds or probability scale or whatever.
05:21And then try to interpret that in the context of your research problems.
05:25Presumably you're an expert in your field and your domain and you know what a reasonable range of values are gonna be.
05:31So you use that information to decide what a reasonable prior.
05:36So let's start again with a different prior.
05:39Okay, so let's use the prior
05:40that is more constrained
05:41mean zero.
05:42Standard version 1.5.
05:44And then I'm gonna back transform it using that formula that I derived back to probability scale and this time things look
05:50much better.
05:51So this looks like a pretty uninformative
05:54prior this alpha, normal zeroes 10 revision 1.5 in log
05:58odd scale is a pretty reasonable prior on the probability scale because it allows pretty much all possible values.
06:04But importantly, it down weights the extreme values plus plus one and zero.
06:11These values are down weighted slightly.
06:15So perhaps I could downgrade them even more because it's highly implausible that I would get accuracies of 0 to 25 or maybe
06:2395, I could still imagine.
06:25But beyond that, probably not not for the large set size.
06:28So I mean if I was continuing to work on this problem, probably I would constrain the prior to be even more tighter so that
06:35it flattens out much more towards the edges.
06:38So this kind of flattening out that one can do with the prior is called regularization.
06:43And this is one of the most powerful tools that bayesian methods provide you in the prior specification, you can modulate the
06:52prior such that you can specify the prior.
06:54So that the totally implausible values given your domain problem that you're working on are basically ruled out a priori.
07:02That's a very sensible thing to do.
07:04And it has huge implications when you start feeling complicated models with hundreds of parameters where you don't enough
07:11data to get good estimates from the data on the for the parameters posteriors.
07:17But you do have regularization on the prior so that you can modulate, make sure that the posterior still look reasonable
07:24This process is called regularization.
07:27So I haven't really regularized this prior yet, but it's still better than this one here.
07:33This one looks like a pretty crazy prior to use, although you could still use it and nothing bad would happen.
07:38So you can try it out and see what happens,
07:41and the reason nothing bad would happen is that the data will overwhelm the posterior because there's so much data, there's
07:46gonna be no very little influence of these kinds of vague priors on the posterior distribution for the parameter.
07:56And so as I said, you can go even further.
07:58Okay, so now you can start putting in more and more informative prior to see what the implications are.
08:04And but as I said before, the really interesting thing for us is not the alpha parameter, which is the average accuracy,
08:12but rather the slope parameter which tells us the effect of such size on accuracy.
08:17So that's where all my attention is focused at the moment.
08:21And so what I would do is when I'm actually working on a problem like this, I would work with increasingly informative priors
08:28and investigate their prior predictive consequences in the data.
08:33So one way to do this, the code is all in the text.
08:35So you can look at it later.
08:36I don't want to distract you with the details of the code right now.
08:40But all I have done is that for the different priors for beta and different set sizes.
08:46These I'm plugging into the model, you know, into the prior predictive distribution to the model that produces the prior
08:51predictive distribution.
08:53I'm getting back for each of the priors and each of the set sizes.
08:58I'm getting a range of predicted, you know, accuracy.
09:02These are the prior predictive accuracy.
09:04That's why they're so spread out because they're agnostic almost all of these except this one here, which is very weird and
09:12this one that is very weird because it's got this high waiting for zero and one values.
09:17That's not a great situation to set up for prior because you're already biasing, you know, your prior to go to 01 of course
09:26as I said before in these data, in these data, the posterior is going to be dominated by the likelihood.
09:31So it won't really matter even if you use these.
09:34But the other priors that I've got here, they seem to be more reasonable because they're agnostic, which is good because
09:41they let the data tell us a bit more about what what it has to tell us and at the same time they produce reasonable
09:51flat uniform distributions.
09:56It's so this is this is just showing the distribution of the data a apriori through the prior distributions on beta and so
10:03on.
10:04But you can also look at the predicted differences in accuracy between set sizes.
10:09So the difference between 4 and 2, 6 and 4, 8 and 6.
10:14So these are possible, you know, stepwise differences that you might want to examine for prior specifications on the beta
10:21parameter.
10:23So this is extremely useful because now, you know, under each of these prior specifications, what the prior distribution
10:31on the differences will look like between these two set sizes.
10:37These sets of set sizes.
10:39This is very useful because now I know as a scientist if I'm working on Recall accuracy for the last 20 years and that's
10:46what usually happens to people right there.
10:47Working on one kind of problem for many years, they have a pretty good idea or they should have a pretty good idea of what
10:54the reasonable range of variation is going to be from one set size to the other.
10:59Okay, so this is this is called a sensitivity analysis.
11:04I've shown you many examples of this before but I just wanted to show you that you can systematically set up your workflow
11:09even before you've collected your data, you know what the implications are of each of your prize and nothing stop you from
11:16fitting every single model with all the priors that you have one by one to see what the posterior distributions look like.
11:24Okay, so that will tell you something about how much information you're getting from the prior relative to the information
11:30you're getting from the data.
11:32Okay, so for now I would say that these priors that I chose for the alpha parameter.
11:39You know that flat fryer in the probability scale and the beta parameter normal prior with mean zero and 10 division 100.1
11:48So that would be this guy here.
11:50This seems like a reasonable prior to choose, you know, so to look at the to fit the model.
11:57I could have chosen a vague more vague prior or even these prizes and they would still have been okay, as I mentioned earlier.
12:05So the next thing that we will do is we will fit the model and then as usual examine the posterior distribution of the parameters
12:13and try to draw inferences from it.