This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00We have now looked at the bivariate distribution in the discrete case.
- 00:05What I'm going to do next is to talk about the bivariate distribution in the continuous case.
- 00:11So, as you can see the title I'm talking about bivariate and Multivariate distributions, but I'm only discussing bivariate
- 00:18distributions here because they're easier to conceptualize and to draw graphically.
- 00:22But the ideas that I'm presenting here will generalize to any number of random variables.
- 00:27Okay.
- 00:27And later on, when we do basic data analysis, we'll be working very intensively with Multivariate distributions.
- 00:35So they will become our bread and butter activity.
- 00:37Okay, So let's think about by very distributions in the continuous case.
- 00:44So now imagine a situation where you have two random variables just like I showed last time in the discrete case, but this
- 00:50time they're coming from some normal distribution.
- 00:53So just to be concrete, let me say that they come from a standard normal distribution with mean zero and standard deviation
- 00:59one.
- 00:59Okay, let's also assume that there is some correlation between these two.
- 01:04So, for a real life example, which doesn't involve the standard normal, you can think of height and weight, right?
- 01:10If you measure each person's height and weight, these will tend to be positively correlated with each other, right?
- 01:16Because the taller person is the heavier they might be.
- 01:21Okay, So of course there will be variation, but in general that that there might be a positive tendency, you know, positive
- 01:27correlation.
- 01:28So that's what I mean by a correlation here informally.
- 01:31Um And so what I'm saying here is that you've got two random variables in this case, the standard normal.
- 01:36Both of them are standard normal but they have a positive correlation or negative or some correlation between them, which
- 01:42I'm calling Ro xy.
- 01:44So I generally put the subscript, you know, for the random variables that I'm talking about when I'm talking about the correlation
- 01:51between two random variables I will call, Reference them with the subscript.
- 01:56So that it's clear which random variables I'm talking about when I talk about their correlation,
- 02:03because they can be more than two.
- 02:05That's why this is necessary.
- 02:06But sometimes it's clear from context.
- 02:08And so sometimes you'll just see row instead of the subscript.
- 02:12But that doesn't matter because, you know from context, which which correlation we're talking about.
- 02:17So in this case the example I'm considering here, right, We are going to describe a by various distribution now.
- 02:28So we're going to describe a probability density function for a by very distribution.
- 02:34Taking into account the means of the two random variables.
- 02:38Standard deviations of the two random variables, just like in the standard normal case that we saw earlier.
- 02:44But the new thing now is that we have a correlation between potentially we have a correlation between the two random variables
- 02:50So we need to include that in the probability density function equation to describe, you know, the the relationship, the
- 02:58way that these data are going to be generated with that particular correlation now, what we do in statistics for such by
- 03:06various distributions is that we will describe the standard deviation and correlation in a very special matrix form, which
- 03:14is called variance covariance matrix.
- 03:17Ok, So in this particular case because we have two random variables right?
- 03:21We have a two by two variance
- 03:23covariance matrix.
- 03:24If we have three random variables we will have a three by three and so on.
- 03:28This of course depends, the dimensions of the matrix will depend on the number of random variables you have, but today I'm
- 03:35only considering the bivariate case, just to keep the story tractable.
- 03:39Okay, so we have a two or two variance covaraince matrix.
- 03:41And what does this look like?
- 03:42So I will show, you know, So the variance covariance matrix is generally written in statistics with a big sigma.
- 03:48Okay, so of course, now this is confusing because this is very similar to the summation symbol, but from context you will
- 03:55know that we're not talking about summation here, but about about the variance covariance matrix.
- 04:01So, in the bivariate case, the variance covariance matrix has a very specific form.
- 04:08The diagonals of the variance covariance matrix will contain the variances, not the standard deviations.
- 04:15The variances of each of those two random variables.
- 04:18So the square of the standard deviation and the off diagonals.
- 04:22Okay, so this is one off diagonal this the other off diagonal.
- 04:26The off diagonal contains the so called covariance between the two random variables.
- 04:32Now, if you've never heard about covariance, intuitively you can think of it like this.
- 04:37So if there's a positive correlation, if one random variable increases in magnitude, the other one also increases in magnitude
- 04:44so that would be a positive correlation.
- 04:46So that we would call the covariance would be positive, right?
- 04:51When one increases the other also increases and you can imagine the situation where it's the opposite and correlation is
- 04:56negative.
- 04:57So the definition of covariance is written here, you don't really need to know more than this actually for the purpose of
- 05:04our course here.
- 05:05But of course there's more detail in the lecture notes.
- 05:07And you can look up textbooks also, of course, which explain more details, but this is what we need to know for our current
- 05:13purposes.
- 05:14Okay, so the off diagonals have the covariances which is defined as the correlation multiplied with the two standard deviations
- 05:21of the two random variables.
- 05:23Okay, so once we know these three numbers, the variance of x, the variance of y, or rather the standard deviations and
- 05:33then the correlation, we can figure out these variance covariance matrix.
- 05:36So this is very useful because now we can describe completely how these two random variables are jointly distributed.
- 05:46Remember the discrete case I showed you earlier there, we have discrete values.
- 05:50Now we have continuous values but we can still talk about the joint distribution of continuous values.
- 05:55And the way we write this in statistics is that if you have two random variables X and Y, we're going to say that these
- 06:02two random variables have as pdfs two dimensional normal distribution.
- 06:08That's what the subscript under the n means, the means of those two distributions.
- 06:14I'm just assuming zero because I'm talking about the standard normal in my example and some variance covariance matrix.
- 06:20So the form of this matrix will be as I just described earlier.
- 06:24If it's the standard normal, what would sigma squared X and sigma squared Y
- 06:29be?
- 06:29These will be one, right?
- 06:31Because the standard deviaton is one.
- 06:32And so whatever the correlation is, is what you would get in the off diagonals here for the standard normal case that I'm
- 06:39discussing.
- 06:40So this is called the joint pdf of these two random variables.
- 06:45And you will often see it written like this with F again the F for the pdf and with the subscript which actually specify
- 06:52which random variables we're talking about these uppercase Xs and Ys actually refer to the abstract object, you know, the
- 06:59random variable and the lower case X and Y are referring to specific, you know, data that you might get.
- 07:06Okay, so I will always make this distinction between capital X and lower case X.
- 07:13Capital X means the abstract random variable and lower case X means a particular data point or data set that we have.
- 07:20Okay, alright.
- 07:22So this is the joint probability density function of the of this particular example.
- 07:28What does it look like?
- 07:29So, it's very important, I have said this before to get a graphical intuition for all of these abstract mathematics ideas
- 07:37So they feel very dry and unintuitive if you look at them as equations, but it's much easier to visualize them as figures
- 07:46as graphics.
- 07:47And this will help you understand what's going on here.
- 07:50So, I'm going to show you all this.
- 07:51So one important property of this joint probability density function has to be right for it to be a proper probability density
- 07:59function.
- 07:59It has to be that the area under the curve has to sum to one.
- 08:03Right now, this is a joint probability density function is going to be a cube.
- 08:09It's going to be a kind of cone, not a cube, but a cone.
- 08:14And I'm going to show you the shape in a few seconds.
- 08:18But this cone the area under the curve contains the probabilities of all possible outcomes.
- 08:25So, if you think about all the possible outcomes in X and Y, the total area under the curve has to sum to one.
- 08:32So that otherwise it's not a proper probability density function.
- 08:36Right now, I'm just showing you the formal story, but I'm going to show you the graphical
- 08:44intuition then you'll see that these ideas are not actually very complex.
- 08:47They do look complex when you look at these equations, but they're not really.
- 08:51So, what we're saying here, this statement in english is just saying that given this joint probability density function,
- 09:00the total area under the curve
- 09:02I'm summing up over X and Y is going to be one.
- 09:05That's what I just said a few seconds ago.
- 09:07And we'll just visualize this in a second.
- 09:11And I could also write the cumulative distribution function now.
- 09:20I could ask what is the joint probability of observing the value like u for the X
- 09:25random variable and v for the Y random variable or something less than that
- 09:32In this three dimensional space, I can also ask that.
- 09:34And that probability can also be computed
- 09:37using the CDF by just carrying out the integral.
- 09:41So, remember when we were doing the CDF earlier, we were summing up things, in one dimension, because we had a univariate
- 09:47distribution.
- 09:48Now we have a bivariate distribution.
- 09:50So that's why there are two integral.
- 09:52There's conceptually nothing new going on here.
- 09:55All that's changed
- 09:56is that the number of variables has changed.
- 09:58Okay, so luckily we don't have to do any of this math.
- 10:01When we're actually doing analysis.
- 10:04It's just important to understand what it means to have a joint distribution.
- 10:10Okay, So
- 10:14we can also compute just like in the discrete case that I showed you earlier.
- 10:18we can also compute the marginal distributions.
- 10:21So, we can figure out the marginal distribution of the random variable X by summing over the Y variable.
- 10:29That's what I had done earlier with the discrete case in the previous lecture.
- 10:33And similarly, you can do the same thing for the marginal distribution of Y
- 10:39So this is nothing new here, because all we've done is we've replaced the summation symbol in the discrete case with
- 10:45the integral.
- 10:46Nothing more.
- 10:48So, now, for the visualization.
- 10:50So, this should help you understand what we're talking about when we're talking about this example of two standard normal
- 10:57variables, right, X and Y.
- 11:00These are perhaps correlated,
- 11:03or perhaps not.
- 11:04So, what I'm showing you now on the right hand side here, I'm showing you the cone that I was talking about earlier.
- 11:11This cone here describes the joint probability density function of a bivariate distribution of the type
- 11:18I'm discussing.
- 11:19This picture here shows you the contour plot from above.
- 11:22So, it's like a geographical plot showing you the density, you know, of the points that are making up this cone here.
- 11:29So, you're looking at this cone from above here,
- 11:32and what I'm showing you in this lower part of this plot here is the the joint cumulative density function,
- 11:39of this random variable.
- 11:41So, what this joint cumulative density function, which will go up to one by the way
- 11:45is going to tell me the probability of finding some value like you for X and Y
- 11:52or some value less than that.
- 11:53Just like in the in the standard case that we learned about earlier.
- 11:57So it's all generalizing.
- 11:58So what's another interesting thing you should notice here is that the correlation that I've specified here is zero.
- 12:05And what that means is that this cone is going to be perfectly symmetrical.
- 12:13And the reason for that is that there's absolutely no relationship between X and Y.
- 12:17Because correlation is zero here,
- 12:19when correlation is zero, you will see this characteristic spreading around of the data points, you know, around the center
- 12:26points around the means of the two random variables in this case
- 12:29the mean is zero and zero here.
- 12:31So you see a consistent spreading around of data around this and there's no correlation here.
- 12:37But what would happen if correlation were positive?
- 12:40What would this contour plot look like?
- 12:42Just think about that before you look at the next part of my lecture.
- 12:47Maybe you want to pause the lecture and just think about it for a second
- 12:51what would this contour plot look like if there was a positive correlation between X and Y?
- 12:58So what would happen if it's a negative correlation? A negative correlation would involve when X is going up,
- 13:06Y will be going down.
- 13:08So the contour plot will look like this.
- 13:11There will be this characteristics, this angling of this contour plot.
- 13:17When you've got a negative correlation and the shape of this cone will also shift.
- 13:22You can imagine this looking at this contour plot from the side and you will see the cone looking like this.
- 13:27This is the cumulative distribution function here.
- 13:30Now, what would happen now?
- 13:31The question that I asked you, what would happen if the correlation is positive?
- 13:36What would this contour plot look like?
- 13:39The contour plot is going to shift in its directionality,
- 13:44it's going to get squeezed in this positive direction.
- 13:46And what this means is that when X is increasing now, Y is also increasing,
- 13:50see the covariance is positive now,
- 13:53and the correlation is of course positive.
- 13:55And so the shape of this contour plot of this
- 14:00joint pdf will also change.
- 14:02So that's basically the main issue that I wanted to get across to you here.
- 14:07In the continuous case, just like in the discrete case we've got marginal and conditional distributions that we can compute
- 14:14and we've got the joint probability density function that will be described in the case of the normal distributions that
- 14:21will work with so frequently, right, will be described in terms of the means and the variance convariance matrix.
- 14:28These are very important ideas that we will need when we are working with, especially with hierarchical models.
- 14:36Alright, so one thing I want to show you now is something very cool.
- 14:40You can actually get a very good intuition for what a bivariate distribution will look like by just
- 14:46simulating data.
- 14:47This is why I taught you all about those r norm functions and so on.
- 14:51You need this functionality to be able to generate simulated data to develop intuitions about what you're working on the
- 14:58problem you're working on.
- 14:59So what I first do here is I've created a variance convariance matrix.
- 15:03This is a two by two matrix.
- 15:05And what's happening in this matrix is that I've got in the first row, first column, I've got 5 squared which is the variance
- 15:13of the first random variable.
- 15:14I just decided on something.
- 15:16And here I've got 10 squared which is the variance of the second random variable.
- 15:23And on the off diagonals I've got the covariances and I'm assuming a correlation of 0.6.
- 15:29So what does this actually mean?
- 15:30So I just want you to just take a look at how I would write this out if I wanted to you
- 15:39know, explain this mathematically.
- 15:41So I've got two by two variance
- 15:43covariance matrix.
- 15:45And so I've got five squared here, this is the variance for the first random variable and I've got 10 squared here and I'm
- 15:53assuming a correlation of 0.6 here.
- 15:56I'm just assuming this, why I'm assuming these numbers because I just want to generate some simulated data.
- 16:01So I have to choose some parameter values to do that.
- 16:05So sigma X is five
- 16:07And sigma
- 16:08Y is 10.
- 16:10In real life data analysis
- 16:11of course you do not have the luxury of knowing what these parameters are.
- 16:15The whole game is about estimating these parameters.
- 16:17But we'll get to that soon.
- 16:18Right now
- 16:19we're trying to understand how to generate data,
- 16:21simulate data.
- 16:22So on the off diagonal I'm going to write this correlation.
- 16:26Rho times sigma X times sigma
- 16:28Y, what would that be?
- 16:30It would be 0.6 times
- 16:345 times 10.
- 16:37So this number would be the same number here and here.
- 16:41So I wrote the formula on the top and the actual numbers on the bottom.
- 16:44So this would be my variance covaraince matrix, which I'm writing as big sigma.
- 16:50This is the big sigma here.
- 16:52And what I'm showing you here is that I created that in R And then what I do is I use this math library which
- 16:58contains a multi variate rnorm function.
- 17:02If you remember in the univariate case we had rnorm function for generating simulated data.
- 17:07We can simulate fake data In a multi variate situation in this case a bivariate situation.
- 17:14And what I'm doing now is that I'm using the mass library to to run this function, the mvr norm function.
- 17:21I'm generating 100 data points.
- 17:24So a set of 100 data points.
- 17:26That means a total number of 200 data points from a distribution with means zero and zero.
- 17:33So this is how I'm specifying how many dimensions I have in this distribution, it's a bivariate distribution.
- 17:40If I had written 00 and another mean here for example, 0, if I had written 3 zeros here,
- 17:46then I would be talking about a distribution with three random variables.
- 17:50That would be a multi variate distribution and then the sigma would have to change
- 17:53also right? The sigma is a two by two variance covariance matrix.
- 17:58Why
- 17:58because I have two random variables right now, but if I had three then I would have to write a three by three variance covariance
- 18:03matrix.
- 18:04So coming back to this case of two random variables, I specify my means, I specify my sigmas and I
- 18:14strongly advise you to play with this a little bit.
- 18:16Change the means, change the sigmas and the correlations and see what happens.
- 18:20So what I do now is I save the results of the simulation in this matrix u and what I've got here
- 18:28is 100 rows and two columns.
- 18:32Why do I have two columns?
- 18:33Because I have two random variables.
- 18:34I've generated random data from the random variable X here and from Y here.
- 18:40And so what's cool here, is that I specified a correlation of 0.6 between these two.
- 18:47If you just look at these three data points, you're not really clear on what's gonna go on.
- 18:52Like if I look at the correlation, but if I plot these data points, these are 100 data points from the X
- 18:58random variable and 100 from the Y random variable.
- 19:00You see this positive correlation here.
- 19:02So, if you just fooled around with this code a bit and change this plus 0.6 to minus 0.6
- 19:10and run the code again.
- 19:11Run all this code again.
- 19:12You will find that the data is now going to have this negative, angling you're seeing, you're seeing
- 19:19a positive angling here,
- 19:20you see a negative angling here.
- 19:22If you on the other hand, if you set this correlation to zero, you should try this out, try it out at home.
- 19:29Set all the correlations here
- 19:31these two correlations here to zero on the off diagonals.
- 19:34What you will then get when you generate data is a blob a symmetric blob
- 19:40that is basically just showing you that there is no correlation between the two random variables.
- 19:50So, usually,when I teach this material that I have just presented on random variables and distribution and so
- 19:58on
- 19:58somebody always complains to me that why are you teaching us all these theoretical nonsense.
- 20:06Why can't we just do data analysis
- 20:07right away?
- 20:08And in fact, that's how I learned data analysis too as a graduate student at Ohio State, I was just like thrown into the middle
- 20:16of things just given the data and told what commands to run and analyze my data.
- 20:21Now, the problem with doing that kind of mickey mouse data analysis is that you have no idea what's going on behind the formulas
- 20:31that you're using in your r code or whatever.
- 20:34What I'm trying to do is I'm trying to make sure that you fully understand what the assumptions are of all the models that
- 20:41we're going to build.
- 20:42Because later on, when we build more complex models, the assumptions will pile up.
- 20:47And you want to be sure that you understand what you have assumed is producing the data right.
- 20:54Often these assumptions assumptions are not reasonable.
- 20:57And you will see later on in the book that the story can become incredibly complicated and there you have to be very clear
- 21:05about what multi various distributions you're assuming and what generative process you're assuming for the data.
- 21:11That is why it is so important to know what a probability mass function is, what a probability density function is, what
- 21:18a marginal distribution is,
- 21:19what's the conditional distribution.
- 21:21How do these DPQR
- 21:23functions work.
- 21:24We need those because when we are going to start thinking about prior distributions, which I will explain very soon
- 21:31when we start visualizing prior distributions to try to understand what we think, plausible values will be for the parameters
- 21:38we need to be able to use these DPQR
- 21:40functions to work out what we assume about the priors.
- 21:45So this is a very important skill that I hope to convey in this course and that's why I made you suffer through all this
- 21:51technical detail.
- 21:53This is the preparation that we need to fully understand what we're doing when we're actually carrying out data analysis.
- 22:00So in my work as a psycholinguist, I repeatedly see published papers where even a simple one sample T test is not done correctly.
- 22:08This happens even today and it's going to happen forever.
- 22:11And the reason for that is that the foundations are very shaky among the people, for example, in psychology and in linguistics
- 22:19it happens quite often that the foundational ideas are not there because people just were not willing to spend one
- 22:26week thinking about probability density functions and probability mass functions.
- 22:30And then you pay the price for that shaky foundation down the road.
- 22:34This is a very expensive price to pay if you're trying to do science.
- 22:39So why not just spend a week and figure out all these basic ideas.
- 22:44There's really not much.
- 22:45It just involves simple addition, maybe division at one point and that was it.
- 22:49And some graphical intuition is all you need.
- 22:51And once you understand these issues it will be much easier to understand how bayesian modeling works and even how frequentist
- 22:58modeling works.
- 22:59I mean all of statistics is based on the ideas that I just presented to you.
- 23:04So what we're going to do next after finishing this hard first week is we're going to now get our hands dirty with bayesian
- 23:14modeling.
- 23:14I'm going to show you some really cool simple bayesian models that you can do on a piece of paper without any computer.
- 23:22And then we will move on to much more complex models involving computational tools.
To enable the transcript, please select a language in the video player settings menu.