3.6 Computational Bayes

This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?

3.6 Computational Bayes

Time effort: approx. 7 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00Until now, we've been looking at this simple model of reaction times using a normal likelihood.
00:07But as I showed you in the very beginning, when I plotted the data, there's a skew to the right in the data.
00:12And when you have this kind of positively distributed data with no data points below zero.
00:20And with this skew to one side, often better likelihood function to use is the log-normal likelihood function.
00:30So let's take a look at that and see what that brings to the table.
00:36So first of all, what is a log-normal likelihood?
00:39This is just another probability density function or another type of random variable if you want to think of it in those
00:45terms.
00:46So, the idea here is that if some data is log normally distributed, then what that means is that the log of that data
00:54is normally distributed.
00:56So why would be the reaction times that we're talking about?
00:59And if I assert or if I hypothesize that this is log normally distributed, then taking the log of the reaction times would
01:07give me a normal distribution.
01:09So basically the relationship between the log-normal and the normal is stated here.
01:15If y is log normally distributed, then what you could do is you could take the log of the observed data.
01:23y always in milliseconds.
01:25So you could take log milliseconds and that will be normally distributed then.
01:30That's the important thing here.
01:31So one important change that this entails is that the mu and sigma parameters are now on the log millisecond scale.
01:44We are now talking about these parameters on the log millisecond scale and the observed data is on the millisecond scale.
01:52And what this kind of distribution will do is that it will generate a typical skewed data set.
02:00The reaction times will have this typical skew that you actually observe in reaction time data.
02:05So there's an example of generating some data from some simulated data from the log-normal.
02:15So how do I do this?
02:15I define some parameters mu and sigma on the log scale.
02:19Okay, so six log milliseconds, what would that be in milliseconds?
02:24Just take the exponent of six.
02:26That would be the reaction time in milliseconds.
02:33So I'm generating 50,000 or is it 500,000?
02:36500,000 samples.
02:38Half a million samples using a function called the
02:42rlnorm function.
02:43So you see again the dpqr functions in action.
02:46I can generate similar data from a log-normal.
02:49I specify the mu and sigma and the number of data points.
02:52And I get simulated data here.
02:53And this is what it looks like.
02:55So now this log-normal distribution seems to be a pretty reasonable way to talk about reaction times.
03:02It has this characteristic skew, the data are all positive.
03:05It's all good.
03:07So that's why whenever one does analysis with things like reaction time and so on, which are positively skewed.
03:15You generally use a log-normal likelihood because it's a much more reasonable representation of the generative process.
03:25Alright.
03:25So let's start by refitting the model with a log-normal likelihood
03:31this time. We have the same parameters.
03:34But these parameters now have to be defined on the log scale.
03:38The priors have to be defined on the log scale.
03:40This is a very common mistake that people make.
03:43They use a log-normal likelihood but they forget that the parameters are no longer on the millisecond scale.
03:49So they will use priors a common mistake is to use a uniform prior going from 0 to 60,000. On the log
03:58scale,
03:59that's a gigantic range.
04:02Okay, trying to figure out what the exponent of 60,000 would be.
04:05Okay. So that's why it's very important to figure out what prior you should use on the log scale.
04:12So, we start with very vague priors going from 0 to 11 using a uniform
04:19mu and for sigma will use a uniform going from 0 to 1.
04:23That's a pretty vague prior.
04:25An uninformative prior for the log-normal here.
04:29So you can of course verify that what I just said.
04:31You can verify that by generating the prior predictive distribution.
04:35So how would I do that?
04:37So here's some code for this in the textbook we give this function normal predictive distribution function.
04:43All that is doing is very simple.
04:45It's just generating samples.
04:48We just get some samples from a uniform for the particular prior that we have from mu and the prior we have for sigma.
04:55So, there is a vector of samples.
04:57How many?
04:581000 samples here?
04:59For each of these data points and I just plug them into this function which takes these samples and just generates, this
05:08function simply generates some observed data.
05:11It's a very simple function that produces this data.
05:14And then you will get this data and you can look at the properties of this data.
05:19So, let's look at the properties of this prior predicted data under this prior specification that I have.
05:24So what I've done here is that I'm showing you some statistics of the prior predictive distributions.
05:29I've generated multiple data sets.
05:31And I'm going to look at the distribution of the mean reading times of those simulated datasets, multiple data sets.
05:39So, the distribution of the mean reading times is pretty broad, as you can see this is so broad.
05:45Why?
05:46Because the prior specifications are so big.
05:49And so we're getting a broad range of variation, very large values.
05:52If you look at these response times in milliseconds, these are gigantic
05:57mean reading times that we're getting from the predicted data.
06:01So the prior specification is completely absurd.
06:04You know, even though it's not really gonna matter in terms of looking at the posterior because the posterior will not be
06:11heavily affected by the prior specification because we have sufficient data, but that's not always the case.
06:17We sometimes have very sparse data in that case the prior specification can be very important and can have important implications
06:25for the interpretation of the data.
06:27So similarly, if I look at the median reading times in the simulated datasets, pretty broad variation, the minimum and maximum
06:34reading time is also very, very heavily across the data sets.
06:38So the priors are not super great, but that's fine.
06:42It's a good starting point.
06:43It's always fine to try out different priors to see what happens and which priors make more sense
06:49given the problem that you're working on, what one can do is one can come up with more reasonable priors for this kind of
06:55log-normal likelihood and that's what I will talk about next.