4.5 Regression models

This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?

4.5 Regression models

Time effort: approx. 15 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00It's time now to look at a very interesting class of regression models.
00:04These are called generalized linear models and a very important case that we use a lot in science is logistic regression
00:13And this also has a hierarchical version called generalized linear mixed model.
00:17But today I will only introduce the generalized linear model with the special case of the logistic regression example.
00:26So let's start with an example as usual.
00:30So imagine doing an experiment like the following.
00:33What we want to do is we show our participants different words, one after another in a sequence.
00:41So for example dog had what's that third word run and hug.
00:46And so then they see a blank screen and then they are prompted to respond to what was the third word that you saw in
00:56that list.
00:57So they have to give a response.
00:58Now that could be either correct or incorrect.
01:02So they have to type in that response that's what that box is therefore.
01:06So that's what the experiment is about.
01:09And so what we're going to do is we're going to increase the set size.
01:12The number of words that people have to see on the screen.
01:15And we want to know whether their accuracy of recall is going to be affected by the number of words that they're gonna see
01:22You can imagine what's going to happen.
01:24As you increase the number of words accuracy is gonna tank.
01:28They're gonna go down.
01:29Okay.
01:30So let's take a look at this data.
01:31This is also in our beak oxide package.
01:33This DF recall data set.
01:36So if you look at this data set, it's important to note that I have repeated measurements from each subject here.
01:43Okay, So again, I'm ignoring that detail, I'm gonna treat these data as independent and identically distributed.
01:50I'm going to fix that problem at the end of this set of lectures but for now I'm ignoring the the independence assumption
01:58here and my response is you know, the dependent variable now is no longer a millisecond reading times or pupil size or anything
02:06like that.
02:07These are 01 responses.
02:10What distribution corresponds to a 01 response?
02:13It's the Bernoulli.
02:15We studied that at the beginning.
02:17So here we have the 01 response to an obvious thing to use here would be the Bernoulli likelihood.
02:24And the generalization of that is of course the binomial.
02:27So that's why you might sometimes see in software either Bernoulli or binomial.
02:32So don't be confused by that.
02:33It just refers to the number of trials.
02:35Right, okay, so we have some other information in this data frame that tells us which session it was which block and so on
02:43and what the set size was.
02:45This is the interesting thing for us.
02:46So there are different numbers of words that we are exposing our participants to write and so again as before.
02:54What we're gonna do is we're going to center are set size.
02:57This is a vector that contains numbers like 2,4,6 and 8.
03:01We're going to center it.
03:03So that's either negative or positive.
03:04But with a mean of zero.
03:06That's what centering is doing for us.
03:07And the reason we're doing this is that we want the intercept to have the interpretation that it represents the grand mean
03:16the grand mean of what?
03:17I will just explain that in a minute.
03:19But in this model it will again represent the grand mean that's an interesting and useful thing to do statistically.
03:27Okay, so let's look at what we have in this dataset.
03:31So just to quickly confirm that we indeed have set sizes 2,4,6 and 8.
03:37So we're going to look at the accuracy change as set size increases, theoretically what I expect from common sense.
03:45Is that my accuracy will go down as the set size increases.
03:49Alright.
03:50So
03:53yeah, Notice also that I have multiple measurements for each set size.
03:59I've got 23 measurements here for each of these set sizes.
04:02So I've repeated measurements on this.
04:05So a simple model that we can start with which is technically not exactly right for this situation, but it's almost right
04:13is the generalized linear model with a so called link function.
04:20Okay, so the way this works is this follows we're going to model the 01 responses.
04:25The correct incorrect responses rather than incorrect correct responses as a Bernoulli distribution.
04:31Remember that?
04:32The Bernoulli distribution has only one trial and there's some probability of success represented by this data parameter.
04:39So we are getting these ones and zeros from subjects coming from some Bernoulli distribution.
04:45And this index end that I've put under data represents the row id so every zero and one in the data frame, let's go back
04:54every zero and one in the data frame is being described by some Bernoulli distribution and there is some data associated
05:02with that row that is generating that data.
05:06There's a generative process that I'm describing.
05:09You know when I write down this model here.
05:11Okay, so this model looks pretty straightforward but what we're gonna do is we're gonna put a little twist to this situation
05:18The twist is that we're gonna take those theta sub ends for each of those roads and take the odds With those thetas and then
05:29take the log of that.
05:30So this is called this term inside the brackets.
05:33It's called the odds right?
05:34So if the probability of success is 50% for example the odds are 1-1 right?
05:43That's how we speak so often in day to day life.
05:46People will say what are the odds of it's raining today, what they actually mean is what are the what is the probability
05:51of it's raining today In statistics odds has a different meaning than probability.
05:55Okay, so it takes the probability value theater and converts it into a ratio in this way the ratio of probability of success
06:03divided by the probability of failure.
06:05Okay, so we're going to work on this log odds scale and we're going to fit a linear model on this scale or the reason
06:15is that this we want to stay with the linear modeling framework because it's so convenient.
06:20It has such generalize ability.
06:22If I have multiple predictors, I can plug them in and I can do a lot of things now.
06:27But with this Bernoulli generative process, I'm kind of limited by the nature of the 01 outputs, right?
06:34The discrete outcomes.
06:35So I want to convert it to some kind of continuous outcome and this is the transformation that I will do to achieve that
06:44to achieve a continuous outcome.
06:45So my dependent variable will now become the the log odds.
06:53Okay, And I'm going to fit a model that's going to let me just write it out.
07:01What's going to happen is that I'm gonna have log I'm just going to drop the subscript for just for convenience now and I'm
07:10going to say that I'm going to fit a model not on 01 responses, but I'm going to fit the model on the log odds
07:18scale.
07:20So I will have some predictor.
07:21The story remains exactly the same.
07:24That's the beauty of the gen linear model that we stay with the regression modeling framework.
07:28We don't have to depart from this general framework that is so well established, But now we've got our predictor.
07:34So call it I don't know set size in this case let's say it was centered set size.
07:41We have centered set size.
07:45And so basically this is the standard predictor as before.
07:50Some vector of values.
07:51And we have an intercept and slope now but the intercept and slope is going to be on the log odds scale.
08:00So I will unpack the implications of all this in a minute.
08:04Okay, so that's the model.
08:07And so first of all, let's try to understand what it means to convert a value of probability from the probability scale to
08:17the log out scale, graphical visualizing visualization is the key to everything.
08:22Whenever you're trying to understand statistical concepts, try to visualize the ideas graphically.
08:27So that's what I'm going to do here.
08:29I have created a vector here going from close to zero to close to one.
08:33So this is all the possible values, discrete values of probabilities.
08:38This X.
08:39Then what I did is I converted it to a log odd scale.
08:43Okay.
08:43And so then I just created a data frame that has the probabilities theta
08:48and what I'm calling eta which is the log odds transformation of the probabilities.
08:55Okay, so you can plot all this.
08:57I'm going to show you the plots.
08:58I just wanted to provide the figures so that you can see how it's produced.
09:03It's basically very straightforward.
09:04I'm just using the q plot function G plot to to generate on the X axis side the thetas and on the y axis I
09:12have the right And so what this gives me right, I'll show you is this function here?
09:20So I put in the thetas here on the X axis, on the y axis, I get the log odds equivalent.
09:25And you can see that Now I've got a continuous distribution here.
09:30That I can now use in my generalized linear model.
09:35So this is a very convenient approach that I can take to model this.
09:39And what's even more interesting is, so this is called the logic link function.
09:46It's a function that takes us input the probability and returns the log odds.
09:50Okay, that's called the logic link.
09:53The inverse of the largest link is called the logistic function and it takes us input the data values.
10:00That means these log odds and gives you back the probabilities.
10:05So that's actually pretty easy to do if you can imagine how this would work.
10:09So let me show you how can I figure out what the probabilities would be in this kind of model that I've written out here
10:17Let me write the model out again, just to be clear.
10:21So the model I actually talked about was that I've got log of theater one minus theater
10:30is equal to on the right hand side I have some terms alpha plus beta, blah blah blah
10:36which I will just write as mu, I could just expand that to alpha plus beta, whatever centered set size.
10:46So this is just just an abbreviation just to make my life easier as I write out the story.
10:51Okay, so now how would I figure out the problem to be solved is solved for theater?
10:59How do I do that algebra to the rescue?
11:02Right, exponentially both sides get rid of the log if I exponentially ate the log, right, I get theater over one minus theta
11:12right bracket missing here.
11:14And then I exponential at the right hand side as well.
11:18So then I get exponent of mu, so this guy disappears.
11:22So on the left hand side I've got theta over one minus theta.
11:27And on the right hand side I've got exponent of mu right, what do I do now I'm trying to solve for theta.
11:33Okay, so let me multiply both sides with one minus theta to get rid of that annoying denominator.
11:38And so I end up with data on this side and I end up with one minus theta times exponent of mu on the right
11:46hand side.
11:47So how do I proceed?
11:48Now I'm still trying to solve for theta.
11:50Okay, so let me solve, let me open up the brackets on the right hand side.
11:54So I got exponent of mu right?
11:57They're just simple math.
11:59Now it's not much.
12:01Not very exciting, right?
12:04But the result might surprise you.
12:06So I think I did this correctly.
12:08Yes.
12:08And so on the left hand side, I've got theta.
12:10On the right hand side, I've got exponent of mu minus theta.
12:13Exponent of mu, I want to get all the theta terms on one side.
12:18So how do I do that?
12:20I add theta exponent of mu two both sides.
12:24So I get theta plus theta.
12:27Exponent of mu you can see where this is going now, right?
12:31Because if I end up here now, what do I do?
12:35I still want to solve for theta.
12:36So what do I do I take out the common term theta out of this picture
12:40got theta
12:41one plus exponent of mu, and the bracket missing again.
12:47And then I've got exponent of mu here.
12:50And lo and behold, I can solve for theta by just saying exponent of theta.
12:57Oh sorry, exponent of mu I'm sorry about that, confused this.
13:03Exponent of mu and divided by one plus exponent of mu, so what this means?
13:10The practical implications of this are huge.
13:14Because once I have estimated my parameters on the right hand side of the linear model.
13:20This A and B parameters.
13:21Once I've estimated these guys right, I have new for every possible set size.
13:27I can compute mu and therefore I can go back to the probability space and investigate the implications of my model predictions
13:35at in the probability space by using this equation.
13:39So this is just the inverse, you know, of the link function.
13:44And that's this guy here, right?
13:46You plug in the theta the eight hours and you get back to the right.
13:51That's the basic story here, right.
13:52And this code is only there for your convenience to reproduce the plots.
13:57If you want to understand these, how to produce these plots.
14:00Okay, Alright.
14:02So now that I've explained the basic idea behind this logistic regression model, I'm going to now decide on price and then
14:10start fitting the model.