I would like to create some graphs showing draws from a model's posterior distribution. I recently discovered the ggdist
package and can see it has incredible functionality for graphing all kinds of distributions. I want to create a graph akin to the graphs in John. Kruschke's Doing Bayesian Data Analysis, that show both the highest posterior density interval and the posterior density falling within a region of practical equivalence.
You can see that on this graph in black is an annotation to the graph that displays (1) the upper and lower bounds of a 90% hdi, (2) how much posterior density falls within and either side of a pre-defined ROPE (here defined as between 0.45 and 0.55).
Below is R code for a simple interaction model based on the iris dataset, using the brm()
function from the brms
package. It shows the posterior for estimated difference between three categorical versions of Sepal Length (I named the variable SepalLength_cat
) at each of three levels of Species
. I used the compare_levels()
function from the marginaleffects
package to obtain these.
library(tidyverse)
library(brms)
library(ggdist)
library(tidybayes)
library(marginaleffects)
library(modelr)
iris %>%
reframe(range = range(Sepal.Length),
names = c("min", "max"))
glimpse(iris)
iris %>%
mutate(SepalLength_cat = factor(case_when(Sepal.Length >= 4.3 & Sepal.Length < 5 ~ "cat1",
Sepal.Length >= 5.1 & Sepal.Length < 6.4 ~ "cat2",
TRUE ~ "cat3"))) -> irisAlt
mod <- brm(formula = Sepal.Width ~ Species*SepalLength_cat,
data = irisAlt)
data_grid(data = irisAlt,
Species = c("setosa", "versicolor", "virginica"),
SepalLength_cat = c("cat1", "cat2", "cat3")) %>%
add_epred_draws(mod,
re_formula = NA,
seed = 1234) -> modPred
# 4000 predicted values of Sepal.Width per 3 x 3 combination of SepalLength_cat and Species
nrow(modPred); head(modPred)
# compare levels
modPred %>%
compare_levels(variable = .epred,
by = SepalLength_cat) %>%
mutate(SepalLength_cat = factor(x = SepalLength_cat,
levels = c("cat2 - cat1",
"cat3 - cat1",
"cat3 - cat2")),
Species = factor(Species)) -> compCat
Now I use the stat_histinterval()
and stat_spike()
functions from the ggdist
package to create histograms for each cell in this 3 x 3 matrix of posterior distributions. Notice in the stat_spike
call I have used the at =
argument to specify a ROPE between -0.25 and 0.25 and spikes for the median and the quantile interval.
compCat %>%
ggplot(mapping = aes(x = .epred,
y = SepalLength_cat)) +
stat_histinterval(mapping = aes(fill = after_stat(level)),
point_interval = mode_hdci,
.width = c(.5,.8,.95)) +
stat_spike(mapping = aes(linetype = after_stat(at)),
at = c(-0.25, 0, 0.25,
"median",
"qi interval" = qi)) +
scale_fill_brewer(na.value = "gray95") +
scale_thickness_shared() + # makes sure spike same height as dist
facet_grid(.~Species) +
theme_minimal()
Here is the result
The help for the stat_spike()
function is very long and I cannot see if there is any way, either within this function or any other in ggdist
to
(a) Annotate the graph with labels estimating posterior density within a range of values (b) colour code the ROPE spikes to be different colour to the qi and median spikes.
Any help much appreciated