4.1 Regression models

This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?

4.1 Regression models

Time effort: approx. 14 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00what we've done so far is that we have looked at simple regression models with a single predictor.
00:07Not even a predictor.
00:08It was an intercept that we were looking at.
00:10This was the button pressing data.
00:12And I showed you how you can assume a log normal likelihood function to model how this data might have been generated.
00:20So now what we're gonna do is we're going to build on that very simple statistical model and keep adding more and more structure
00:27to it.
00:28That's basically what the rest of this course is all about.
00:32So as an example, I want to think about a particular situation where we are doing a psychological experiment and what we
00:42are asking the participants to do is to track between zero and five objects on the screen as they move around and then they
00:51have to what they have to tell us at the end of the trial is which of the items were there on the screen that
00:59we'd ask them to keep track of as targets.
01:02Right?
01:02So to give you an idea of what this looks like graphically.
01:07So at the beginning of the trial a subject will see a fixation cross, then they will see a bunch of points on the screen
01:13and then some of the points will be distinctive.
01:16The sense that they're the ones that the participant has to track.
01:19And then these points start moving around on the screen and at the end of the trial the subject is asked to select the targets
01:27The ones that they were.
01:28The points that they were asked to track.
01:30So this is the basic issue that we are studying here.
01:33And actually the issue that one is studying the cognitive process that one is studying is attentional load.
01:39So how accurate are you at keeping track of the targets as attentional load increases and load is increased by increasing
01:48the number of dots on the screen.
01:50So the question the research question that we want to answer here is how attentional load affects pupil size.
01:58So we are recording the size of the pupil as you're forced to pay more and more attention to the task.
02:04We know that from research in psychology we know that if you overload a participant and make them concentrate more on a particular
02:13task, their pupil size will increase that.
02:15So they it's called arousal.
02:17And so this attentional increase is something that you can model with.
02:23Um this kind of um uh this kind of experiment design.
02:30So what is a good statistical model for this?
02:33We can start simple.
02:34So let's start with a very simple model where we assume a normal likelihood and the dependent variable here is the pupil
02:42size that we are looking at.
02:44n refers to the nth row in the data frame.
02:48So we will always be working with data frames and the data frames will be indexed with the the position in the row which
02:56I'm going to call small n.
02:57So small and goes from one to capital N and capital
03:00N is the total number of data points in that data frame.
03:04So the model that we are going to think about is pupil size is coming from a normal distribution with some mean and some
03:11standard deviation.
03:12So this you have seen before of course.
03:14But what's new is that we're going to have a predictor in the mean component of the normal distribution specification here
03:23So instead of just having um you hear, as you have seen before, we have an intercept as well as something that we call a
03:31slope the beta parameter, which is going to be determined by the load that the experiment forces the participant to experience
03:43So let's take a look at what will happen now when we fit such a model.
03:49Notice a very important assumption I'm making here, which is often glossed over but this is an extremely important assumption
03:56The assumption is that every data point in those and data frames is independent from every other data point.
04:02So, in the frequentist world we would say that we have independent and identically distributed data.
04:07Now, whether this assumption is actually satisfied or not in this data is a separate question.
04:12We're starting with a simple model.
04:14So let's proceed with this simple assumption and then we'll elaborate on this model later on.
04:18Okay, alright, so let's first ask ourselves what kind of prior specifications can we give to the parameters of interest in
04:28this model.
04:28What are the parameters we have?
04:29The alpha parameter,
04:30the beta parameter and the sigma parameter.
04:33These are the three parameters in this model.
04:35Very simple model.
04:36But we do need to define priors on these parameters.
04:39So how do we decide on what the prior.
04:41So let's first think about the intercept.
04:44With the intercept I need to have some knowledge about what pupil sizes can be.
04:49Well, pupil size can't be negative.
04:51Whatever the units are, we don't know what the units are here but there is some positive number, right?
04:56Pupil size cannot be negative but we can do better than just speculating about this.
05:00We can look up some pilot data.
05:02There's lots of data out there on the internet And you may have done some pilot research before we actually do the experiment
05:08So you can examine some pilot data.
05:11And what I'm doing here is I'm looking at the distribution the summary statistics on pupil sizes.
05:18So I see that.
05:18Well, the minimum is 852 units and the maximum is 868, whatever this unit is.
05:24So this suggests something to me.
05:26And what does it suggests?
05:27It suggests that I could use a prior That would look like what's an equation to I could for the intercept parameter I could
05:34choose a normal distribution prior with me in 1000 and a standard deviation of 500.
05:42So what are we expressing when we specified as a prior?
05:45We are actually saying something.
05:46We are expressing a belief about what we think are plausible values of this parameter.
05:53And so how do I quantify that belief?
05:55You can do that with the qnorm function.
05:57That's why all these dpqr family of functions are so useful.
06:00So you can just take the qnorm function and pull out the 95% interval, you know, that covers 95% of the area of the
06:08curve.
06:08And this tells you what your prior beliefs before seeing any new data is for the average effect, you know, of the pupil
06:17size, right?
06:18That's what this parameter is going to represent.
06:20The average effect of pupil size and this prior is now representing a reasonable range.
06:26I don't know.
06:26Maybe this is too liberal, right?
06:28One could change this prior to something more informative, but we can think about those situations later.
06:33Now for sigma we could start by using an uninformative prior, as I have explained earlier, the posterior is going to
06:43be a compromise between the prior specification and the likelihood.
06:47And so if you have a very vague prior on the parameter, you will tend to get the same posterior distribution as you would
06:54if you were looking at the maximum likelihood estimate.
06:59So you get the mean of the posterior would be very close to the maximum likelihood estimate for a vague prior.
07:05So that was a demonstration.
07:06I made a long time back in a previous lecture.
07:11So what we're gonna do is for the sigma parameter because sigma cannot be negative, you're going to truncate the normal distribution
07:17at zero.
07:18So we're going to stipulate that you cannot get a value less than zero.
07:22So what we have is a truncated prior distribution.
07:25That's what I've got here, normal distribution mean zero standard deviation 1000 truncated at zero.
07:32Okay, so you can plot this now with the norm or something to see what it looks like.
07:36But you can also extract using this extra disk package, you can extract the 95% quan tiles of this distribution.
07:44And you see that the range is pretty reasonable.
07:46It's quite liberal actually, it's going from 31 to 2241 which allows for a lot of variability.
07:53So, but as I said, this is an uninformative prior and we can we can work with this and nothing bad is gonna happen.
08:00You know, if you use such an uninformative prior.
08:03Now the last parameter of interest, which is the most important one is the beta parameter.
08:07This is the parameter that represents the effect of attentional load on pupil size and so I start with this kind of ballpark
08:15assumption about a beta parameter coming from a normal distribution.
08:20This is the prior.
08:20Now with mean zero and standard deviation 100.
08:24What this expresses is that the attentional load can either increase or decrease people's eyes.
08:31Of course this is not realistic.
08:33I mean, we know from research in psychology that if you increase attentional load there will be a tendency to increase the
08:38people size will increase.
08:40But we want to remain agnostic.
08:42We want to leave open the possibility that you could go in the opposite direction as attentional load increases, pupil size
08:49could in principle decrease.
08:51So that's what this uninformative prior is expressing.
08:54And you can quantify the 95% credible interval again as before, which expresses your prior belief about what plausible effects
09:03you could expect for increasing attention load by one unit, for example.
09:08So what does the data look like?
09:10Well, what I'm going to do is I'm going to show you this DF pupil data.
09:14This is in the beak oxide package which comes with the textbook.
09:17And so the first thing I'm going to do before we look at the data is to center the predictor.
09:22This centering is a very important thing that we are always going to do are almost always going to do when we do regression
09:29modeling.
09:29It has important properties, the detail I cannot discuss here because it's pretty involved technically.
09:35But in the textbook, we unpack this idea of centering in a lot of detail.
09:39So it's an optional extra you can look at if you're interested but what we now do is what the centering means centering means
09:46that I take the vector of the different loads.
09:50The different loads are the number of dots people are going to see on the screen.
09:55These are discrete numbers.
09:56So what I do is I take the mean of that vector.
10:00That's the mean of that vector and subtracted from the load vector.
10:05What that does is it transforms the vector of loads which are all positive values 2468 and so on into either negative or
10:14positive values.
10:15The values will be negative just in case the load is less than the mean load in the entire experiment and they'll be positive
10:22if they're larger and there'll be zero.
10:24Of course if it's exactly the same value as the as the mean of the vector load vector.
10:29So centering is a very useful thing to do for various reasons.
10:33As I said this is this deal this has to do with regression modeling.
10:36Not just vision but just frequentist as well.
10:39So I won't really discuss that detail too much here.
10:43But all I'm doing is I'm just showing you how you would create this vector called center load.
10:48And this is the vector that I will use when doing the modeling.
10:54Okay so let's just quickly take a look at this data frame before I move on.
10:57I've got a subject column.
10:59I've got a trial column.
11:00This tells me the trial ID.
11:02And notice that I have multiple measurements from different from the same subject subject 701 is giving me multiple data
11:09points for for example load zero.
11:13I've got at least two Data points from subject 701.
11:16So I have repeated measurements on this on the subject.
11:20Even though I'm assuming in the toy model that I started with today that we have independent and identically distributed
11:27data.
11:28That's not the case obviously.
11:30But let's start simple and then we can elaborate on this model later.
11:34Okay so we have a predictor load which is centered.
11:38And now we're going to start to look at the effect of load center load on people's sides.
11:43So let's see what happens.
11:44So the first thing we're going to do is to run the model through the brm function.
11:51So if you've ever done any linear modeling this this first line should be familiar to you were just modeling pupil size as
11:58a function of an intercept.
11:59That's the alpha.
12:01And then a slope centered load.
12:03And this is our data data frame that I just created here.
12:08I've defined the likelihood that I'm assuming the normal likelihood that I defined that specified here.
12:14So generally you should always specify which likelihood you're using just to be explicit and the Gaussian is just a synonym
12:21for the normal likelihood.
12:23And finally here's the prior specification.
12:26I've defined the price.
12:27I just showed you a few minutes ago.
12:29These are the prices that are just specified, noticed that for the sigma parameter, the brm function, which is a front end
12:37for stand automatically knows that sigma has to be a truncated normal.
12:42If you specify a normal distribution here or some other distribution here, sigma will always be set to be positive because
12:50you cannot have a sigma less than zero.
12:52So that's that's something that's internally dealt with for you.
12:55This is good in a way because it's convenient for current models, but it's important to know what the syntax is actually
13:03doing.
13:04So, underlying li it's writing stand code and it's setting up the parameters and constraining them so that they have the
13:10appropriate range.
13:13So we have just fit the model.
13:16And now in the next lecture, I'm going to show you what you can get out of this model.
13:20You're going to get the posterior distributions of the three parameters and you can look at the posterior predictive distributions
13:27to see what you learn from this model and what the model predicts for future data, which is a very important use of statistical
13:35models in general predicting the future.