3.8 Computational Bayes

This video belongs to the openHPI course Introduction to Bayesian Data Analysis. Do you want to see more?

3.8 Computational Bayes

Time effort: approx. 11 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00Alright.
00:01So now let's take a look at how we can evaluate this log-normal model with these relatively informative priors.
00:09So what we fit in the previous lecture was a simple linear model with reaction time as a function of some intercept, which
00:20is the mu parameter.
00:21We had the space bar data which recalls reaction times in a button pressing task and we have a log-normal likelihood with
00:29these priors.
00:30Normal distribution mean 6,
00:33Standard deviation 1.5 for the mu parameter for the intercept and a truncated normal with mean 0 and standard deviation 1
00:42for sigma.
00:43The truncation is done by brms internally because brms
00:47as a package knows that sigma cannot have negative values.
00:50So the truncation is done internally.
00:53So once you have the posterior distribution for the mu parameter for example, typically in a scientific research problem
01:01this is the parameter that we are primarily interested in.
01:04The sigma is usually a nuisance variable for us.
01:07So often there is not much focus on the sigma variable.
01:11But what we're interested in is the posterior distribution of the mu parameter given the data.
01:16So how would I summarize it?
01:18Well, of course you could summarize the posterior in terms of the log on the log scale but it makes more sense to actually
01:27look at the posterior on the millisecond scale here.
01:30So you can interpret it in terms that you understand.
01:32And so one way that you can do this is to basically just take the exponent of the posterior distribution.
01:44This entire distribution just exponentiated. And this gives you the median, you know, of the posterior.
01:52And and so what you can then do is you can calculate statistics on this distribution and report the mean and
02:02the 95% credible interval.
02:04So this information I would report in the paper, you know, if I'm summarizing the results of this analysis and this tells
02:10me pretty much everything I need to know that, given the data and given my priors, my predicted or expected reading time is
02:18167 milliseconds.
02:20And I'm 95% sure given these data, given these particular data that this range is between 164 and 169. Of course,
02:31this credible interval does not mean that these are the true values of this mu parameter.
02:36Because these values are conditioned on the data that we happen to get.
02:40Maybe we've got skewed data, maybe we've got biased data from a subject who was particularly tired or not representative
02:48of their normal behavior on some other day.
02:52They might not have slept the previous night for example.
02:55So it could be biased data.
02:57And so if the data are biased then your estimates are not gonna represent the reality.
03:02So you should treat all these credible intervals
03:05and this means as you should be very clear about the fact that they're conditioned on the data that you have.
03:12And whether those data reflect reality or not, who knows?
03:16So it's tempting to generalize from these data, from these posteriors to reveal the unknown reality out there.
03:24But the reality is by definition unknown.
03:26We just don't know what the true mu is.
03:29We're trying to get at some estimates of these given some priors and given the data that we happen to have.
03:36Okay, alright.
03:37But we can still look at the posterior distributions, the posterior predictive distributions to see if the future data produced
03:45by this model is reasonable given the data that we actually have.
03:49So this is just a sanity check which tells us that yes, the model is producing reasonably, you know, distributed data given
03:58what we've seen in this particular data set.
04:01One other question that one often asks is, well, I had this normal likelihood earlier and then I switched to the log-normal
04:09likelihood.
04:10Did I get an improvement in the fit in some way?
04:14So you could ask whether the likelihoods are, you know, one likelihood is better than the other likelihood.
04:20So, that's an interesting question to ask to.
04:22And you can use the posterior predictive data to check that.
04:25Okay, so here is the distribution of the minimum reading times produced by the normal likelihood model that we have
04:35fit long ago and this vertical bar is the observed minimum reading time.
04:40That's of course a point value in the data.
04:42And so what we notice is that the normal model is actually generating two small minimum reading times here.
04:54The distribution is too far away from the vertical bar that we've got here. In the log-normal model, we see somewhat
05:01better distribution of minimum reading times.
05:05Compared to the observed data.
05:06So in that sense, subjectively just looking at these figures.
05:10Intuitively, I don't have any quantified statistics or anything here.
05:13I'm just using a graphical check to decide whether the log-normal model does better than this normal model.
05:21And the answer seems to be yes.
05:23I'm producing minimum reading times that are pretty close to the actual observed minimum reading time.
05:29Now one can also check whether the two models, the normal likelihood model and the log-normal likelihood model, whether
05:37they reflect the maximum observed reading time in the data.
05:41So the maximum reading time observed is about 400 and something milliseconds.
05:47And but the normal model, the maximum reading times are very short.
05:52So it's kind of missing something important.
05:55The normal model is missing something important about the observed data.
06:01And interestingly, the log-normal model is also underestimating the maximum reading time.
06:07So if this data is representative, you know of some kind of systematic behavior of this subject, then both models actually
06:15failing to capture that.
06:17So often models will be imperfect in this sense.
06:20Even the log-normal model does not capture or capture all aspects of the data.
06:26Often that's okay. Models are always imperfect.
06:29That's why it's called a model.
06:31It's not the actual reality.
06:32So they will miss some aspects of reality.
06:34You just need to know what those aspects are and what's missing in the data.
06:39So in this particular case, you know, what could be happening
06:42is that what we might be looking at in the data is not just one distribution, but a mixture of distributions.
06:53So what could be happening is that when we look at data in, we see the skew in the data.
07:02What could lie behind that single distribution that we're seeing is a mixture of two distributions that look like one skewed
07:10distribution.
07:10So there could be a mixture process that's producing this data.
07:14As you will see in the later lectures in the textbook, you can actually define a generative process where you assume that
07:23there are a few rare slow reading times that could model this long tail here of you know, very rare but very slow reading
07:33times.
07:33This could be attentional timeouts that the subject is experiencing getting bored pressing the button.
07:39You know, they could be in rare occasions producing very slow reading times, but most of the data could be coming from a
07:45different distribution.
07:46So you can actually define this kind of finite mixture models that can model this kind of data.
07:52So this is an elaboration of the linear model.
07:55And this also illustrates the great flexibility of the Bayesian approach.
07:58You can really build elaborate process models that reflect the underlying generative process and it doesn't have to be a
08:06simple generative process of the type that we are studying here.
08:09That's one of the beauties of the framework that we're looking at here.
08:14All right.
08:15So what's happened so far, we've looked at two simple examples of a simple linear model using two different likelihoods the
08:22normal and the log-normal.
08:24And what we learned to do was to generate prior predictive and posterior predictive data using the
08:30brms package.
08:31We can do that or we can do it in R ourselves.
08:34And so these two approaches to understanding the model are very important.
08:39They will teach us about what the model predicts and what the underlying assumptions of the model are.
08:45Whether they produced reasonable data.
08:50And so what we tend to say is that these are telling us about the descriptive adequacy of the model.
08:57And so we can ask both what happens before we've seen the data.
09:01That's the prior predictive distribution.
09:03And what happens after we've seen the data.
09:05That's the posterior predictive distribution.
09:07And this is usually part of the workflow of doing a Bayesian data analysis.
09:11You do this prior and posterior predictive checks.
09:14You do a sensitivity analysis and that constitutes a complete analysis of data set.
09:20Usually you may only report the final analysis in the paper but underlying all this investigation, one should
09:28do to check whether the model makes any sense at all.
09:32Alright.
09:33So what's going to happen next is that we are going to now elaborate on this simple linear model in several ways.
09:40First we're going to add a predictor.
09:42So instead of just having an intercept, we're going to have an intercept and slope.
09:46And this becomes a regression model, you can also have multiple regression models now.
09:51So this improves the flexibility of this linear modeling approach and allows us to ask all kinds of questions about complex
09:58data sets.
10:00Another example I will show you is of using logistic regression.
10:04This will be familiar to people doing machine learning where you've got a 0, 1 response.
10:10Where we're trying to model this kind of Bernoulli process,
10:15using a linear modeling framework.
10:17So we're going to use logistic regression for that.
10:19And finally, I'm going to show you just a glimpse of what a linear mixed model or a hierarchical model looks like
10:26where you start adding information about individual participants,
10:31the variability due to individual participants into the model.
10:35This is a very sophisticated framework that allows you to study individual differences for example, and it allows you to
10:43analyze repeated measures of dependent data.
10:46This is a very important framework and it's a bread and butter framework for many fields like psychology and linguistics.
10:53So that's coming up next.