4.2 Regression models

Це відео відноситься до openHPI курсу Introduction to Bayesian Data Analysis. Бажаєте побачити більше?

4.2 Regression models

Часове навантаження: прибл. 11 хвилин

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Прокрутити до поточної позиції

00:00What we've done so far is that we fit a very simple linear model.
00:04Ignoring some of the details of the data.
00:07The non independence in the data and treating it as independent.
00:10A simple model.
00:12Looking at the effect of attentional load centered attentional load on people's size.
00:17I just showed you the code for the model.
00:19Now I'm going to show you what you get out of this model and how to interpret what's in this model.
00:25So first of all, one easy way to summarize the output from the model is to use this built in plot function.
00:32So the plot function knows that you're looking at a stand object.
00:36So the fitted model is a stand object underlying lee and it uses that information to print out the appropriate plot function
00:44as you know in art there's also this plot function for standard based graphics but this plot function is working with the
00:51rMS output.
00:53So what this this is the fit pupil data that I had fit earlier and the things to look at here is that there are
00:59three parameters indeed that we had set up is the intercept, here's the slope and I know that it's the slope because the
01:07name of the predictor c load, center load is listed here and here's the sigma parameter.
01:13As you can see the sigma parameter is positive.
01:16The posterior distributions ranging from in the positive positive range.
01:21And this comes from the fact that I have
01:28enough data that the posterior distribution is heavily influenced by the likelihood.
01:35So what's going to happen is that the mean of this sigma and the variability of this sigma is going to be not very different
01:42from the the mean that I would get from a standard linear model in the frequently setting.
01:48Okay, what is more interesting for me is to look at the distribution, the posterior distribution of the slope parameter because
01:57the slope parameter is now telling me what the range of variability is going to be in the pupil size when I increase attentional
02:05load by one unit.
02:07So what the intercept is telling me the average pupil size?
02:12Why?
02:13Because I centered the predictor.
02:15Once you center the predictor, the intercept has the interpretation that it represents the grand mean.
02:21So the mean pupil size independent of any attentional load.
02:25You see the advantage of centering now.
02:27And so what the slope now means is that when you increase attention attentional load by one unit, the increase in pupil size
02:37is represented by this posterior distribution.
02:39So, roughly speaking, The pupil size would increase by about 10-60 units.
02:47With every unit increase in attentional load.
02:49That's what the model predicts.
02:51Whether this is the truth or not is a separate matter.
02:54We're just trying to draw inferences from the data we happen to have.
02:57So what we learned from the data is that this is the effect of attention loan on pupil size.
03:05So what you see on the right hand side are the four chains.
03:08The
03:09m c sampler is now trying to get samples from these distributions.
03:16And what you're seeing is that these four chains are sitting on top of each other.
03:21These are called fat hairy caterpillars.
03:23And this is a good sign because it shows that the chains are landing on top of each other.
03:28And what that means is that each of the chains for each parameter is sampling from the same distribution.
03:35If you didn't have these these chains sitting on top of each other, that would indicate some problems with convergence.
03:42And there are examples of non convergence in the textbook.
03:45And of course we explain how to deal with that in the textbook.
03:48But right now, I'm only showing you well behaved models that perform nicely.
03:52So you can see what happens in the usual case.
03:56Okay,
03:59so that's one way to summarize the output with a figure like this.
04:03Usually what we do in a paper is that we don't show the chains.
04:07One doesn't generally show the chains.
04:09One just shows the posterior distributions of the parameters and perhaps also a table that summarizes the results.
04:16So what are the important things that you can summarize for the reader when you're writing a paper, you can some the mean
04:21of the posterior for the intercept and the 95% credible interval.
04:26This is the lower and upper bound of the 95% credible interval.
04:29These are diagnostics about convergence.
04:32That right now we don't need to worry about because we're going to fit models only those models that converge right now.
04:39So this is okay.
04:41You can look at the details in the textbook Later.
04:43But what's important in this output is that you get to see what the model tells you about the plausible values of each of
04:53the parameters.
04:54For example, the average pupil size would be about 701 within 95% probability of lying between 661 and 740.
05:05That's what the data tell you.
05:06That's not necessarily the truth.
05:07But that's what we learned from these data.
05:10And so one of the great advantages of the bayesian framework is that you can talk about your uncertainty about the parameter
05:17Not just you know, like in the frequentist world we have, is it significant?
05:22Is it not significant?
05:23That is a much less interesting issue.
05:25Much more interesting is what is the range of variability of that parameter.
05:29Similarly, the thing that we are really interested in scientifically is the effect of attentional load
05:35on people's sides.
05:36And what this output this second line is telling me is that for every unit increase in attentional load there's a 34 unit
05:45increase predicted In um pupil size.
05:49And my uncertainty about that is between 11 and 57 units here.
05:55So this is telling me how uncertain I am about this prediction.
05:59So this is a very important piece of information because if you have sparse data then your uncertainty will be much larger.
06:06So you've learned a lot less from that particular data set.
06:10If you happen to have a lot more data, you will get pretty tight
06:13these are called credible intervals, you get tighter credible intervals which are much more informative about your problem
06:19because they're constraining what the plausible values are given the data.
06:23And finally, if you have the sigma parameter here with the posterior mean and the 95% credible interval here.
06:30So that's an easy way to summarize, you know, the output, we are writing out we wrote our own function that produces the
06:37short output.
06:38It's in the code that will be provided with this course.
06:41So you can look at that later.
06:44So one other interesting thing you can do with this model now is that you can study what the model would predict for future
06:52data.
06:53Remember that in this, in this experiment design we have either an attentional load of 0 1 2 3 4
07:02right, 0 to 4, that was all there was right, so five levels of attention load.
07:08So what we can now do is that having fit the model and we got posterior distributions for each of the parameters, we will
07:15just generate new data repeatedly using the posterior distributions of the parameters that we have.
07:23So, what we are getting from the model now is future data simulated datasets that represent what the model predicts future
07:32data would look like.
07:33This is a very useful thing to do because it gives you a good understanding of how realistic your model is relative to the
07:39data that you have.
07:41So if you get predicted data that is completely divergent or very far away from your observed data, that's an indication
07:48that there's something wrong in your model specification.
07:50Need to go back and fix it.
07:51There are many examples of such wrong model specifications in the textbook.
07:56But right now, I just want to show you how this would work.
07:58The code for all this is of course available to you.
08:01And so you can look at it later.
08:03So I don't want to distract you with the code right now.
08:05What I want to show you is that the observed data is this density plot this black one here.
08:10These are the actual data points.
08:12And the blue lines that you're seeing.
08:14These are the simulated datasets from the model given the posterior distributions.
08:19So the posterior distributions are informed by the data now and the prior of course.
08:23And so we're getting posterior predictive distributions of the data.
08:28Future data sets given the posterior distribution of the parameter, this is for load zero and you can go and now check for
08:35every possible load level.
08:38How am I doing this?
08:39I'm putting in different load levels into the, into the model.
08:43And I'm just generating data now because I have posteriors for the parameters so I can just plug in different load levels
08:49and I'm going to get predictions from the model.
08:51So for load zero load one, you can see that there are some really wild data here,
08:57There's some really sharp distributions in the simulated datasets, but most of them are generally spread out, which is quite
09:03realistic.
09:03They look quite a lot like the observed data here.
09:06Also we have these occasional outliers, These strange extreme, extremely tight distributions, but most of them are spread
09:12out again for load to load three, it's the same story and load for as well.
09:18There might be something going on in the data.
09:21There might be actually a finite mixture happening here.
09:25There could be two separate distributions that we might be looking at but we are ignoring all that.
09:30We're just treating this data as if it's coming from a simple linear model.
09:34That's what we often do.
09:35We simplify our assumptions even though there might be a more sophisticated model that would be better justified by the by
09:42the data that you have.
09:43This is something that we discuss in more advanced chapters.
09:46But for now this is as a start, this is a good enough model and we will of course improve on this in future iterations
09:56So that's a simple example of what you can do with regression modeling.
10:01You can add one predictor.
10:03Or you can add more than one predictor.
10:05You can have many more predictors.
10:06Of course there are complications associated with that.
10:09But for now let's think about the situation where you add just one predictor and you can use that predictor to figure out
10:16the effect of that variable on your dependent variable.
10:19That's a very standard statistical modeling technique that we use quite frequently in many different areas of science.
10:27Okay, so in the next lecture, what I'm going to do is I'm going to present another model now as an example.
10:35But this time instead of using the normal likelihood, I'm going to use the log normal likelihood.
10:41The log normal has some small complications and subtleties that I want to illustrate with this example.
10:48So that's what we will look at next.