This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00It's time now to look at a very interesting class of regression models.
- 00:04These are called generalized linear models and a very important case that we use a lot in science is logistic regression
- 00:13And this also has a hierarchical version called generalized linear mixed model.
- 00:17But today I will only introduce the generalized linear model with the special case of the logistic regression example.
- 00:26So let's start with an example as usual.
- 00:30So imagine doing an experiment like the following.
- 00:33What we want to do is we show our participants different words, one after another in a sequence.
- 00:41So for example dog had what's that third word run and hug.
- 00:46And so then they see a blank screen and then they are prompted to respond to what was the third word that you saw in
- 00:56that list.
- 00:57So they have to give a response.
- 00:58Now that could be either correct or incorrect.
- 01:02So they have to type in that response that's what that box is therefore.
- 01:06So that's what the experiment is about.
- 01:09And so what we're going to do is we're going to increase the set size.
- 01:12The number of words that people have to see on the screen.
- 01:15And we want to know whether their accuracy of recall is going to be affected by the number of words that they're gonna see
- 01:22You can imagine what's going to happen.
- 01:24As you increase the number of words accuracy is gonna tank.
- 01:28They're gonna go down.
- 01:29Okay.
- 01:30So let's take a look at this data.
- 01:31This is also in our beak oxide package.
- 01:33This DF recall data set.
- 01:36So if you look at this data set, it's important to note that I have repeated measurements from each subject here.
- 01:43Okay, So again, I'm ignoring that detail, I'm gonna treat these data as independent and identically distributed.
- 01:50I'm going to fix that problem at the end of this set of lectures but for now I'm ignoring the the independence assumption
- 01:58here and my response is you know, the dependent variable now is no longer a millisecond reading times or pupil size or anything
- 02:06like that.
- 02:07These are 01 responses.
- 02:10What distribution corresponds to a 01 response?
- 02:13It's the Bernoulli.
- 02:15We studied that at the beginning.
- 02:17So here we have the 01 response to an obvious thing to use here would be the Bernoulli likelihood.
- 02:24And the generalization of that is of course the binomial.
- 02:27So that's why you might sometimes see in software either Bernoulli or binomial.
- 02:32So don't be confused by that.
- 02:33It just refers to the number of trials.
- 02:35Right, okay, so we have some other information in this data frame that tells us which session it was which block and so on
- 02:43and what the set size was.
- 02:45This is the interesting thing for us.
- 02:46So there are different numbers of words that we are exposing our participants to write and so again as before.
- 02:54What we're gonna do is we're going to center are set size.
- 02:57This is a vector that contains numbers like 2,4,6 and 8.
- 03:01We're going to center it.
- 03:03So that's either negative or positive.
- 03:04But with a mean of zero.
- 03:06That's what centering is doing for us.
- 03:07And the reason we're doing this is that we want the intercept to have the interpretation that it represents the grand mean
- 03:16the grand mean of what?
- 03:17I will just explain that in a minute.
- 03:19But in this model it will again represent the grand mean that's an interesting and useful thing to do statistically.
- 03:27Okay, so let's look at what we have in this dataset.
- 03:31So just to quickly confirm that we indeed have set sizes 2,4,6 and 8.
- 03:37So we're going to look at the accuracy change as set size increases, theoretically what I expect from common sense.
- 03:45Is that my accuracy will go down as the set size increases.
- 03:49Alright.
- 03:50So
- 03:53yeah, Notice also that I have multiple measurements for each set size.
- 03:59I've got 23 measurements here for each of these set sizes.
- 04:02So I've repeated measurements on this.
- 04:05So a simple model that we can start with which is technically not exactly right for this situation, but it's almost right
- 04:13is the generalized linear model with a so called link function.
- 04:20Okay, so the way this works is this follows we're going to model the 01 responses.
- 04:25The correct incorrect responses rather than incorrect correct responses as a Bernoulli distribution.
- 04:31Remember that?
- 04:32The Bernoulli distribution has only one trial and there's some probability of success represented by this data parameter.
- 04:39So we are getting these ones and zeros from subjects coming from some Bernoulli distribution.
- 04:45And this index end that I've put under data represents the row id so every zero and one in the data frame, let's go back
- 04:54every zero and one in the data frame is being described by some Bernoulli distribution and there is some data associated
- 05:02with that row that is generating that data.
- 05:06There's a generative process that I'm describing.
- 05:09You know when I write down this model here.
- 05:11Okay, so this model looks pretty straightforward but what we're gonna do is we're gonna put a little twist to this situation
- 05:18The twist is that we're gonna take those theta sub ends for each of those roads and take the odds With those thetas and then
- 05:29take the log of that.
- 05:30So this is called this term inside the brackets.
- 05:33It's called the odds right?
- 05:34So if the probability of success is 50% for example the odds are 1-1 right?
- 05:43That's how we speak so often in day to day life.
- 05:46People will say what are the odds of it's raining today, what they actually mean is what are the what is the probability
- 05:51of it's raining today In statistics odds has a different meaning than probability.
- 05:55Okay, so it takes the probability value theater and converts it into a ratio in this way the ratio of probability of success
- 06:03divided by the probability of failure.
- 06:05Okay, so we're going to work on this log odds scale and we're going to fit a linear model on this scale or the reason
- 06:15is that this we want to stay with the linear modeling framework because it's so convenient.
- 06:20It has such generalize ability.
- 06:22If I have multiple predictors, I can plug them in and I can do a lot of things now.
- 06:27But with this Bernoulli generative process, I'm kind of limited by the nature of the 01 outputs, right?
- 06:34The discrete outcomes.
- 06:35So I want to convert it to some kind of continuous outcome and this is the transformation that I will do to achieve that
- 06:44to achieve a continuous outcome.
- 06:45So my dependent variable will now become the the log odds.
- 06:53Okay, And I'm going to fit a model that's going to let me just write it out.
- 07:01What's going to happen is that I'm gonna have log I'm just going to drop the subscript for just for convenience now and I'm
- 07:10going to say that I'm going to fit a model not on 01 responses, but I'm going to fit the model on the log odds
- 07:18scale.
- 07:20So I will have some predictor.
- 07:21The story remains exactly the same.
- 07:24That's the beauty of the gen linear model that we stay with the regression modeling framework.
- 07:28We don't have to depart from this general framework that is so well established, But now we've got our predictor.
- 07:34So call it I don't know set size in this case let's say it was centered set size.
- 07:41We have centered set size.
- 07:45And so basically this is the standard predictor as before.
- 07:50Some vector of values.
- 07:51And we have an intercept and slope now but the intercept and slope is going to be on the log odds scale.
- 08:00So I will unpack the implications of all this in a minute.
- 08:04Okay, so that's the model.
- 08:07And so first of all, let's try to understand what it means to convert a value of probability from the probability scale to
- 08:17the log out scale, graphical visualizing visualization is the key to everything.
- 08:22Whenever you're trying to understand statistical concepts, try to visualize the ideas graphically.
- 08:27So that's what I'm going to do here.
- 08:29I have created a vector here going from close to zero to close to one.
- 08:33So this is all the possible values, discrete values of probabilities.
- 08:38This X.
- 08:39Then what I did is I converted it to a log odd scale.
- 08:43Okay.
- 08:43And so then I just created a data frame that has the probabilities theta
- 08:48and what I'm calling eta which is the log odds transformation of the probabilities.
- 08:55Okay, so you can plot all this.
- 08:57I'm going to show you the plots.
- 08:58I just wanted to provide the figures so that you can see how it's produced.
- 09:03It's basically very straightforward.
- 09:04I'm just using the q plot function G plot to to generate on the X axis side the thetas and on the y axis I
- 09:12have the right And so what this gives me right, I'll show you is this function here?
- 09:20So I put in the thetas here on the X axis, on the y axis, I get the log odds equivalent.
- 09:25And you can see that Now I've got a continuous distribution here.
- 09:30That I can now use in my generalized linear model.
- 09:35So this is a very convenient approach that I can take to model this.
- 09:39And what's even more interesting is, so this is called the logic link function.
- 09:46It's a function that takes us input the probability and returns the log odds.
- 09:50Okay, that's called the logic link.
- 09:53The inverse of the largest link is called the logistic function and it takes us input the data values.
- 10:00That means these log odds and gives you back the probabilities.
- 10:05So that's actually pretty easy to do if you can imagine how this would work.
- 10:09So let me show you how can I figure out what the probabilities would be in this kind of model that I've written out here
- 10:17Let me write the model out again, just to be clear.
- 10:21So the model I actually talked about was that I've got log of theater one minus theater
- 10:30is equal to on the right hand side I have some terms alpha plus beta, blah blah blah
- 10:36which I will just write as mu, I could just expand that to alpha plus beta, whatever centered set size.
- 10:46So this is just just an abbreviation just to make my life easier as I write out the story.
- 10:51Okay, so now how would I figure out the problem to be solved is solved for theater?
- 10:59How do I do that algebra to the rescue?
- 11:02Right, exponentially both sides get rid of the log if I exponentially ate the log, right, I get theater over one minus theta
- 11:12right bracket missing here.
- 11:14And then I exponential at the right hand side as well.
- 11:18So then I get exponent of mu, so this guy disappears.
- 11:22So on the left hand side I've got theta over one minus theta.
- 11:27And on the right hand side I've got exponent of mu right, what do I do now I'm trying to solve for theta.
- 11:33Okay, so let me multiply both sides with one minus theta to get rid of that annoying denominator.
- 11:38And so I end up with data on this side and I end up with one minus theta times exponent of mu on the right
- 11:46hand side.
- 11:47So how do I proceed?
- 11:48Now I'm still trying to solve for theta.
- 11:50Okay, so let me solve, let me open up the brackets on the right hand side.
- 11:54So I got exponent of mu right?
- 11:57They're just simple math.
- 11:59Now it's not much.
- 12:01Not very exciting, right?
- 12:04But the result might surprise you.
- 12:06So I think I did this correctly.
- 12:08Yes.
- 12:08And so on the left hand side, I've got theta.
- 12:10On the right hand side, I've got exponent of mu minus theta.
- 12:13Exponent of mu, I want to get all the theta terms on one side.
- 12:18So how do I do that?
- 12:20I add theta exponent of mu two both sides.
- 12:24So I get theta plus theta.
- 12:27Exponent of mu you can see where this is going now, right?
- 12:31Because if I end up here now, what do I do?
- 12:35I still want to solve for theta.
- 12:36So what do I do I take out the common term theta out of this picture
- 12:40got theta
- 12:41one plus exponent of mu, and the bracket missing again.
- 12:47And then I've got exponent of mu here.
- 12:50And lo and behold, I can solve for theta by just saying exponent of theta.
- 12:57Oh sorry, exponent of mu I'm sorry about that, confused this.
- 13:03Exponent of mu and divided by one plus exponent of mu, so what this means?
- 13:10The practical implications of this are huge.
- 13:14Because once I have estimated my parameters on the right hand side of the linear model.
- 13:20This A and B parameters.
- 13:21Once I've estimated these guys right, I have new for every possible set size.
- 13:27I can compute mu and therefore I can go back to the probability space and investigate the implications of my model predictions
- 13:35at in the probability space by using this equation.
- 13:39So this is just the inverse, you know, of the link function.
- 13:44And that's this guy here, right?
- 13:46You plug in the theta the eight hours and you get back to the right.
- 13:51That's the basic story here, right.
- 13:52And this code is only there for your convenience to reproduce the plots.
- 13:57If you want to understand these, how to produce these plots.
- 14:00Okay, Alright.
- 14:02So now that I've explained the basic idea behind this logistic regression model, I'm going to now decide on price and then
- 14:10start fitting the model.
To enable the transcript, please select a language in the video player settings menu.