This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00what we've done so far is that we have looked at simple regression models with a single predictor.
- 00:07Not even a predictor.
- 00:08It was an intercept that we were looking at.
- 00:10This was the button pressing data.
- 00:12And I showed you how you can assume a log normal likelihood function to model how this data might have been generated.
- 00:20So now what we're gonna do is we're going to build on that very simple statistical model and keep adding more and more structure
- 00:27to it.
- 00:28That's basically what the rest of this course is all about.
- 00:32So as an example, I want to think about a particular situation where we are doing a psychological experiment and what we
- 00:42are asking the participants to do is to track between zero and five objects on the screen as they move around and then they
- 00:51have to what they have to tell us at the end of the trial is which of the items were there on the screen that
- 00:59we'd ask them to keep track of as targets.
- 01:02Right?
- 01:02So to give you an idea of what this looks like graphically.
- 01:07So at the beginning of the trial a subject will see a fixation cross, then they will see a bunch of points on the screen
- 01:13and then some of the points will be distinctive.
- 01:16The sense that they're the ones that the participant has to track.
- 01:19And then these points start moving around on the screen and at the end of the trial the subject is asked to select the targets
- 01:27The ones that they were.
- 01:28The points that they were asked to track.
- 01:30So this is the basic issue that we are studying here.
- 01:33And actually the issue that one is studying the cognitive process that one is studying is attentional load.
- 01:39So how accurate are you at keeping track of the targets as attentional load increases and load is increased by increasing
- 01:48the number of dots on the screen.
- 01:50So the question the research question that we want to answer here is how attentional load affects pupil size.
- 01:58So we are recording the size of the pupil as you're forced to pay more and more attention to the task.
- 02:04We know that from research in psychology we know that if you overload a participant and make them concentrate more on a particular
- 02:13task, their pupil size will increase that.
- 02:15So they it's called arousal.
- 02:17And so this attentional increase is something that you can model with.
- 02:23Um this kind of um uh this kind of experiment design.
- 02:30So what is a good statistical model for this?
- 02:33We can start simple.
- 02:34So let's start with a very simple model where we assume a normal likelihood and the dependent variable here is the pupil
- 02:42size that we are looking at.
- 02:44n refers to the nth row in the data frame.
- 02:48So we will always be working with data frames and the data frames will be indexed with the the position in the row which
- 02:56I'm going to call small n.
- 02:57So small and goes from one to capital N and capital
- 03:00N is the total number of data points in that data frame.
- 03:04So the model that we are going to think about is pupil size is coming from a normal distribution with some mean and some
- 03:11standard deviation.
- 03:12So this you have seen before of course.
- 03:14But what's new is that we're going to have a predictor in the mean component of the normal distribution specification here
- 03:23So instead of just having um you hear, as you have seen before, we have an intercept as well as something that we call a
- 03:31slope the beta parameter, which is going to be determined by the load that the experiment forces the participant to experience
- 03:43So let's take a look at what will happen now when we fit such a model.
- 03:49Notice a very important assumption I'm making here, which is often glossed over but this is an extremely important assumption
- 03:56The assumption is that every data point in those and data frames is independent from every other data point.
- 04:02So, in the frequentist world we would say that we have independent and identically distributed data.
- 04:07Now, whether this assumption is actually satisfied or not in this data is a separate question.
- 04:12We're starting with a simple model.
- 04:14So let's proceed with this simple assumption and then we'll elaborate on this model later on.
- 04:18Okay, alright, so let's first ask ourselves what kind of prior specifications can we give to the parameters of interest in
- 04:28this model.
- 04:28What are the parameters we have?
- 04:29The alpha parameter,
- 04:30the beta parameter and the sigma parameter.
- 04:33These are the three parameters in this model.
- 04:35Very simple model.
- 04:36But we do need to define priors on these parameters.
- 04:39So how do we decide on what the prior.
- 04:41So let's first think about the intercept.
- 04:44With the intercept I need to have some knowledge about what pupil sizes can be.
- 04:49Well, pupil size can't be negative.
- 04:51Whatever the units are, we don't know what the units are here but there is some positive number, right?
- 04:56Pupil size cannot be negative but we can do better than just speculating about this.
- 05:00We can look up some pilot data.
- 05:02There's lots of data out there on the internet And you may have done some pilot research before we actually do the experiment
- 05:08So you can examine some pilot data.
- 05:11And what I'm doing here is I'm looking at the distribution the summary statistics on pupil sizes.
- 05:18So I see that.
- 05:18Well, the minimum is 852 units and the maximum is 868, whatever this unit is.
- 05:24So this suggests something to me.
- 05:26And what does it suggests?
- 05:27It suggests that I could use a prior That would look like what's an equation to I could for the intercept parameter I could
- 05:34choose a normal distribution prior with me in 1000 and a standard deviation of 500.
- 05:42So what are we expressing when we specified as a prior?
- 05:45We are actually saying something.
- 05:46We are expressing a belief about what we think are plausible values of this parameter.
- 05:53And so how do I quantify that belief?
- 05:55You can do that with the qnorm function.
- 05:57That's why all these dpqr family of functions are so useful.
- 06:00So you can just take the qnorm function and pull out the 95% interval, you know, that covers 95% of the area of the
- 06:08curve.
- 06:08And this tells you what your prior beliefs before seeing any new data is for the average effect, you know, of the pupil
- 06:17size, right?
- 06:18That's what this parameter is going to represent.
- 06:20The average effect of pupil size and this prior is now representing a reasonable range.
- 06:26I don't know.
- 06:26Maybe this is too liberal, right?
- 06:28One could change this prior to something more informative, but we can think about those situations later.
- 06:33Now for sigma we could start by using an uninformative prior, as I have explained earlier, the posterior is going to
- 06:43be a compromise between the prior specification and the likelihood.
- 06:47And so if you have a very vague prior on the parameter, you will tend to get the same posterior distribution as you would
- 06:54if you were looking at the maximum likelihood estimate.
- 06:59So you get the mean of the posterior would be very close to the maximum likelihood estimate for a vague prior.
- 07:05So that was a demonstration.
- 07:06I made a long time back in a previous lecture.
- 07:11So what we're gonna do is for the sigma parameter because sigma cannot be negative, you're going to truncate the normal distribution
- 07:17at zero.
- 07:18So we're going to stipulate that you cannot get a value less than zero.
- 07:22So what we have is a truncated prior distribution.
- 07:25That's what I've got here, normal distribution mean zero standard deviation 1000 truncated at zero.
- 07:32Okay, so you can plot this now with the norm or something to see what it looks like.
- 07:36But you can also extract using this extra disk package, you can extract the 95% quan tiles of this distribution.
- 07:44And you see that the range is pretty reasonable.
- 07:46It's quite liberal actually, it's going from 31 to 2241 which allows for a lot of variability.
- 07:53So, but as I said, this is an uninformative prior and we can we can work with this and nothing bad is gonna happen.
- 08:00You know, if you use such an uninformative prior.
- 08:03Now the last parameter of interest, which is the most important one is the beta parameter.
- 08:07This is the parameter that represents the effect of attentional load on pupil size and so I start with this kind of ballpark
- 08:15assumption about a beta parameter coming from a normal distribution.
- 08:20This is the prior.
- 08:20Now with mean zero and standard deviation 100.
- 08:24What this expresses is that the attentional load can either increase or decrease people's eyes.
- 08:31Of course this is not realistic.
- 08:33I mean, we know from research in psychology that if you increase attentional load there will be a tendency to increase the
- 08:38people size will increase.
- 08:40But we want to remain agnostic.
- 08:42We want to leave open the possibility that you could go in the opposite direction as attentional load increases, pupil size
- 08:49could in principle decrease.
- 08:51So that's what this uninformative prior is expressing.
- 08:54And you can quantify the 95% credible interval again as before, which expresses your prior belief about what plausible effects
- 09:03you could expect for increasing attention load by one unit, for example.
- 09:08So what does the data look like?
- 09:10Well, what I'm going to do is I'm going to show you this DF pupil data.
- 09:14This is in the beak oxide package which comes with the textbook.
- 09:17And so the first thing I'm going to do before we look at the data is to center the predictor.
- 09:22This centering is a very important thing that we are always going to do are almost always going to do when we do regression
- 09:29modeling.
- 09:29It has important properties, the detail I cannot discuss here because it's pretty involved technically.
- 09:35But in the textbook, we unpack this idea of centering in a lot of detail.
- 09:39So it's an optional extra you can look at if you're interested but what we now do is what the centering means centering means
- 09:46that I take the vector of the different loads.
- 09:50The different loads are the number of dots people are going to see on the screen.
- 09:55These are discrete numbers.
- 09:56So what I do is I take the mean of that vector.
- 10:00That's the mean of that vector and subtracted from the load vector.
- 10:05What that does is it transforms the vector of loads which are all positive values 2468 and so on into either negative or
- 10:14positive values.
- 10:15The values will be negative just in case the load is less than the mean load in the entire experiment and they'll be positive
- 10:22if they're larger and there'll be zero.
- 10:24Of course if it's exactly the same value as the as the mean of the vector load vector.
- 10:29So centering is a very useful thing to do for various reasons.
- 10:33As I said this is this deal this has to do with regression modeling.
- 10:36Not just vision but just frequentist as well.
- 10:39So I won't really discuss that detail too much here.
- 10:43But all I'm doing is I'm just showing you how you would create this vector called center load.
- 10:48And this is the vector that I will use when doing the modeling.
- 10:54Okay so let's just quickly take a look at this data frame before I move on.
- 10:57I've got a subject column.
- 10:59I've got a trial column.
- 11:00This tells me the trial ID.
- 11:02And notice that I have multiple measurements from different from the same subject subject 701 is giving me multiple data
- 11:09points for for example load zero.
- 11:13I've got at least two Data points from subject 701.
- 11:16So I have repeated measurements on this on the subject.
- 11:20Even though I'm assuming in the toy model that I started with today that we have independent and identically distributed
- 11:27data.
- 11:28That's not the case obviously.
- 11:30But let's start simple and then we can elaborate on this model later.
- 11:34Okay so we have a predictor load which is centered.
- 11:38And now we're going to start to look at the effect of load center load on people's sides.
- 11:43So let's see what happens.
- 11:44So the first thing we're going to do is to run the model through the brm function.
- 11:51So if you've ever done any linear modeling this this first line should be familiar to you were just modeling pupil size as
- 11:58a function of an intercept.
- 11:59That's the alpha.
- 12:01And then a slope centered load.
- 12:03And this is our data data frame that I just created here.
- 12:08I've defined the likelihood that I'm assuming the normal likelihood that I defined that specified here.
- 12:14So generally you should always specify which likelihood you're using just to be explicit and the Gaussian is just a synonym
- 12:21for the normal likelihood.
- 12:23And finally here's the prior specification.
- 12:26I've defined the price.
- 12:27I just showed you a few minutes ago.
- 12:29These are the prices that are just specified, noticed that for the sigma parameter, the brm function, which is a front end
- 12:37for stand automatically knows that sigma has to be a truncated normal.
- 12:42If you specify a normal distribution here or some other distribution here, sigma will always be set to be positive because
- 12:50you cannot have a sigma less than zero.
- 12:52So that's that's something that's internally dealt with for you.
- 12:55This is good in a way because it's convenient for current models, but it's important to know what the syntax is actually
- 13:03doing.
- 13:04So, underlying li it's writing stand code and it's setting up the parameters and constraining them so that they have the
- 13:10appropriate range.
- 13:13So we have just fit the model.
- 13:16And now in the next lecture, I'm going to show you what you can get out of this model.
- 13:20You're going to get the posterior distributions of the three parameters and you can look at the posterior predictive distributions
- 13:27to see what you learn from this model and what the model predicts for future data, which is a very important use of statistical
- 13:35models in general predicting the future.
To enable the transcript, please select a language in the video player settings menu.