Introduction

This tutorial aims to walk you through the mixed effects regression in R. The tutorial has two parts: 1) the linear mixed effects model and 2) the generalized linear mixed effects model.

Mixed models

First, we talk about an experiment. By experiment, we broadly refer to a study where the objective is to assess the effect of one or more factors on a measurable attribute, called outcome. The units for which the outcome is measured is called experimental unit. The factors that affect the outcome are called explanatory variables. If the study aims to compare the effect of specific treatments, then the treatments' effects are treated as fixed effect terms. Suppose that \(\beta\) is the regression parameter that measures the effect of a specific treatment with respect to a reference treatment (or control). Then the effect of the treatment can be verified by testing \(H_0: \beta=0\) versus \(H_a: \beta\ne 0\). On the other hand, if the levels of a factor/explanatory variable are randomly chosen for the experiment, then the treatment's effect is measured via a random effect term. And the presence of the effect is verified by testing \(H_0: \sigma^2_r=0\) versus \(H_a: \sigma^2_r>0\), where \(\sigma_r\) represents the variability of the random effect due the levels of the factor. The mixed effects model contains both fixed effects and random effects. There are several ways this relationship can manifest. We shall examine different mixed effects models in our sleep data example.

Data description

The sleep study dataset (Belenkyn et al., 2003) is available in the lme4 package. The response variable is the subjects' reaction time to a stimuli recorded by researchers Reaction. The subjects (recorded as Subject) on which the experiment was conducted are assumed to be random. The number of days (Days) the subjects undergoing sleep deprivation is believed to exert a fixed effect. Below is the structure of the Sleepstudy data

str(sleepstudy)
## 'data.frame':    180 obs. of  3 variables:
##  $ Reaction: num  250 259 251 321 357 ...
##  $ Days    : num  0 1 2 3 4 5 6 7 8 9 ...
##  $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...

Here we show the first few rows of the data frame. For example, the following lines of code indicate that there are 180 rows and three columns in the data frame, and the number of subjects (or the experimental units) is 18.

head(sleepstudy)
##   Reaction Days Subject
## 1 249.5600    0     308
## 2 258.7047    1     308
## 3 250.8006    2     308
## 4 321.4398    3     308
## 5 356.8519    4     308
## 6 414.6901    5     308
dim(sleepstudy)
## [1] 180   3
length(unique(sleepstudy$Subject))
## [1] 18

In the following plot of the sleep study data, we use different colors for different patients.

ggplot(sleepstudy,aes(x=Days, y=Reaction, group=Subject, color=Subject)) + 
    geom_point(size=1) + theme_classic()

Models and notations

The variables in our model are: \(Y_{i, j}\), the reaction time of the \(i\)th subject at the \(j\)th day, \(\beta\) is a measure of the effect of days, and \(c_i\) is the subject-specific random effect. We assume that the number of days exerts a fixed effect while the subjects' effect is random. In this model (model-1), \(e_{i, j}\) is the residual term assumed to be independent of \(c_i\) and the number of days.

Model 1

In model-1, only the intercept is random, \(Y_{i, j} = \alpha + c_{i} + \beta*(j) + e_{i, j}\). More specifically, we assume that for this model, 1) \(e_{i, j}\)'s are independent with mean equal to 0, finite variance, and follow \({\rm N}(0,\sigma^2_e )\), 2) \(c_1, c_2, \dots, c_n\) are all independent and follow \({\rm N}(0,\sigma^2_c)\), 3) \(e_{i, j}\) and \(c_i\) are independent and 4) \(e_{i, j}\), \(c_i\), and the fixed-effects covariates are all independent.

In model-2, \(Y_{i, j} = \alpha + c_{i} + (\beta+d_{i})*(j) + e_{i, j}\), where both intercept and slope are assumed to vary by subjects. That means both are random, and the effect of the number of days changes with the subjects. The random component of the slope is denoted by \(d_i\), and in this model \(c_i\) and \(d_i\) are allowed to be correlated (i.e., \(\rho_{c, d}\ne 0\)). [Fig 2] Model-2 and model-3 are the same, except that in the later model, the random intercept and slope are assumed to be uncorrelated (i.e., \(\rho_{c, d} = 0\)).

Comments on the variance component estimation

To analyze the data using the mixed effects model, we will utilize the R package lme4. Generally, the restricted maximum likelihood (REML) method is recommended for model fitting. However, REML does not work with model selection techniques, such as AIC and BIC. Therefore, if it is necessary to compare different models, REML should not be used. However, once the best model is selected, REML can be used with that model for more accurate confidence intervals. To avoid the REML method, one should set REML=FALSE in the lmer function. The default method of lmer is REML.

Here is the code for fitting model-1.

model1 <- lmer(Reaction ~ Days + (1|Subject), sleepstudy,REML=F)
summary(model1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Reaction ~ Days + (1 | Subject)
##    Data: sleepstudy
## 
##      AIC      BIC   logLik deviance df.resid 
##   1802.1   1814.9   -897.0   1794.1      176 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2347 -0.5544  0.0155  0.5257  4.2648 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 1296.9   36.01   
##  Residual              954.5   30.90   
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 251.4051     9.5062   26.45
## Days         10.4673     0.8017   13.06
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.380

Output

The summary of the model gives us AIC, BIC, and log-likelihood scores, deviance, degrees of freedom, scaled residuals, and parameter estimates. First, the estimated standard deviation of the random intercept term is 36.01, and that for the residuals is 30.9. Next, the fixed effect of days can be measured via the regression coefficient, and the estimated regression coefficient is 10.47.

Prediction under model 1

Next we plot the predicted response \(\widehat{Y}_{i, j} = \widehat{\alpha}+\widehat{c}_{i} + j\widehat{\beta}\). Each line corresponds to each subject.

ggplot(NULL) + 
  geom_point(data=sleepstudy,aes(x=Days, y=Reaction, group=Subject, color=Subject))+geom_line(data=sleepstudy%>%mutate(pred_dist=fitted(model1)),aes(x=Days, y=pred_dist, group=Subject, color=Subject))+ theme_classic()

Model 2

model2 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy,REML=F)
summary(model2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
##      AIC      BIC   logLik deviance df.resid 
##   1763.9   1783.1   -876.0   1751.9      174 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9416 -0.4656  0.0289  0.4636  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 565.48   23.780       
##           Days         32.68    5.717   0.08
##  Residual             654.95   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  251.405      6.632  37.907
## Days          10.467      1.502   6.968
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

Output

In this output, we have some extra parameters: the random slope variance and the correlation between the two random effect terms. The estimated variance of \(d_i\) is 32.68, and the estimated correlation between \(c_i\) and \(d_i\) is .08.

Prediction under model 2

The following plot shows the raw data and the prediction line for each subject \(\widehat{Y}_{i,j} = \widehat{\alpha} + \widehat{c}_{i} + (\widehat{\beta}_{j}+\widehat{d}_{i})j\).

(ggplot(NULL) + 
  geom_point(data=sleepstudy,aes(x=Days, y=Reaction, group=Subject, color=Subject))+
  geom_line(data=sleepstudy%>%mutate(pred_dist=fitted(model2)),aes(x=Days, y=pred_dist, group=Subject, color=Subject))+ theme_classic())

Model 3

model3 <- lmer(Reaction ~ Days + (Days||Subject), sleepstudy,REML=F)
summary(model3)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Reaction ~ Days + ((1 | Subject) + (0 + Days | Subject))
##    Data: sleepstudy
## 
##      AIC      BIC   logLik deviance df.resid 
##     1762     1778     -876     1752      175 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9535 -0.4673  0.0239  0.4625  5.1883 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  Subject   (Intercept) 584.27   24.172  
##  Subject.1 Days         33.63    5.799  
##  Residual              653.12   25.556  
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  251.405      6.708   37.48
## Days          10.467      1.519    6.89
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.194

Output

The output for model 3 is in the same format as model 2 but without the correlation term.

Prediction under model 3

The following plot shows the raw data and the prediction line for each subject, and the predicted value is \(\widehat{Y}_{i, j} = \widehat{\alpha} + \widehat{c}_{i} + (\widehat{\beta}_{j}+\widehat{d}_{i})j\).

ggplot(NULL) + 
  geom_point(data=sleepstudy,aes(x=Days, y=Reaction, group=Subject, color=Subject))+geom_line(data=sleepstudy%>%mutate(pred_dist=fitted(model3)),aes(x=Days, y=pred_dist, group=Subject, color=Subject))+ theme_classic()

Here we compare the AIC, BIC, and log-likelihood scores for each of the models.

print(c(AIC(model1), AIC(model2), AIC(model3)))
## [1] 1802.079 1763.939 1762.003
print(c(BIC(model1), BIC(model2), BIC(model3)))
## [1] 1814.850 1783.097 1777.968
print(c(logLik(model1), logLik(model2), logLik(model3)))
## [1] -897.0393 -875.9697 -876.0016
anova(model1,model2,model3)
## Data: sleepstudy
## Models:
## model1: Reaction ~ Days + (1 | Subject)
## model3: Reaction ~ Days + ((1 | Subject) + (0 + Days | Subject))
## model2: Reaction ~ Days + (Days | Subject)
##        npar    AIC    BIC  logLik deviance   Chisq Df Pr(>Chisq)    
## model1    4 1802.1 1814.8 -897.04   1794.1                          
## model3    5 1762.0 1778.0 -876.00   1752.0 42.0754  1  8.782e-11 ***
## model2    6 1763.9 1783.1 -875.97   1751.9  0.0639  1     0.8004    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using the minimum AIC and minimum BIC criteria the uncorrelated model (model 3) while model 2 has the largest log-likelihood value. We will go with model 3 as the preferred model.

Confidence intervals for the parameters

First, as mentioned previously, we will refit the model-3 with the REML=T option.

model3 <- lmer(Reaction ~ Days + (Days||Subject), sleepstudy,REML=T)

Confidence intervals in LME4

The lme4 package contains a confint.mermod() function that computes confidence intervals (CIs) for fixed and random effects. The function is shown below with all of its parameters.

# S3 method for merMod credit: https://www.rdocumentation.org/packages/lme4/versions/1.1-29/topics/confint.merMod
confint(object, parm, level = 0.95,
        method = c("profile", "Wald", "boot"), zeta,
        nsim = 500,
        boot.type = c("perc","basic","norm"),
        FUN = NULL, quiet = FALSE,
        oldNames = TRUE, ...)

By default, the confidence level is set to 0.95 and CIs are calculated by the profile likelihood method. Below we obtain the confidence intervals for our model-3. In the output, sig01 refers to the standard deviation of \(c_i\), sig02 refers to the standard deviation of \(d_i\) term, sigma refers to the standard deviation of \(e_{i, j}\), and the fixed effect terms are listed by the names.

confint.merMod(model3)
## Computing profile confidence intervals ...
##                  2.5 %     97.5 %
## .sig01       15.258649  37.786473
## .sig02        3.964074   8.769159
## .sigma       22.880555  28.787598
## (Intercept) 237.572148 265.238062
## Days          7.334067  13.600505

Bootstrap Confidence intervals

Next, we use the same function to calculate the bootstrap confidence intervals for all parameters. The default number of bootstrap samples is 500.

confint.merMod(model3, method = 'boot')
## Computing bootstrap confidence intervals ...
##                  2.5 %     97.5 %
## .sig01       12.022686  35.910672
## .sig02        3.518198   8.580755
## .sigma       22.589035  28.501600
## (Intercept) 238.855363 263.857089
## Days          7.429879  13.109670

Extracting the design matrix

In matrix notation, model-3 is \(Y= X \beta+Zu+e\). Each component of the model can be extracted as follows:

Xi=getME(model3,"X")
Zi=getME(model3,"Z")
Ui=getME(model3,"u")
beta=getME(model3,"beta")

Bayesian Methods

The linear mixed effects model can be analyzed using the Bayesian approach. Bayesian methods can be applied for both small and large sample sizes. We will use the brms package, which uses a syntax similar to the lme4 package. In the Bayesian method, all parameters are treated as random. Therefore, we use prior distribution for each of the parameters. The prior information is stirred into the observed data likelihood resulting in the final product, called the posterior distribution of the parameters. The posterior distribution idea originates from Bayes' theorem, where we use information in the observed data to update knowledge about the model parameters. The background information, known as the prior distribution, is combined with the likelihood function to determine the posterior distribution. To illustrate the Bayesian method, we will use the same sleep study data.

Model 1

Our first model assumes \(Y_{i, j} = \beta_0 + \beta_{1}j + c_{i} + e_{i, j},\) where \(Y_{i, j}\) is the reaction time of the \(i^{th}\) subject at \(j\) days, \(\beta_0\) is the population intercept, \(\beta_1\) is the population slope, \(c_i\) is the random intercept for subject \(i\), and \(e_{i, j}\) is the error term.

Prior

The priors we are using: \(\beta_0 \sim N(251, 10)\), \(\beta_1 \sim N(10, 10)\), \(\sigma_e \sim N(31, 10)I(\sigma_e>0)\), \(\tau_0 \sim N(37, 10)I(\tau_0>0)\). We use the estimates from the classical method to choose the parameters of the prior distributions.

First, load in the rstan and brms packages, then build our base model.

require(rstan)
require(brms)

# Fit a base model using lmer
lmm <- lmer(Reaction ~ Days + (1|Subject), sleepstudy)

# Pull base model estimates from lmer model
b0 <- summary(lmm)$coefficients[1,1]
b1 <- summary(lmm)$coefficients[1,2]
se <- summary(lmm)$sigma
te <- attr(summary(lmm)$varcor$Subject, "stddev")

The BRMS package

As mentioned before, this package does not fit the models itself but uses Stan on the back end. Stan is a probabilistic programming language with a C++ backend. By default, this backend uses the static Hamiltonian Monte Carlo (HMC) Sampler and its extension, the No-U-Turn sampler (NUTS) by Hoffman and Gelman (2014).

Alternatives to these algorithms, not available in the brms package, include random-walk Metropolis, Gibbs-sampling, and slice-sampling. The documentation for this package concludes that its choice of samplers converges much more quickly, especially for high-dimensional models, regardless of whether the priors are conjugate. Read the brms documentation overview for more information.

The brm() function

This function in the brms package fits generalized linear/non-linear multivariate multilevel models (MMM). To set priors, we use the prior() function, which accepts a distribution with parameters as well as a class argument which designates which parameter the prior is associated with. Unfortunately, the brm function does not handle parameters set as variables inside of prior() well, so one must manually enter the prior values into the prior() function.

Default arguments for brm()

Chains: The number of Markov chains to use. The advantage of using more chains is that we can be more confident in understanding the posterior distribution, as each chain has a different starting point. If running in parallel using the "cluster" option, this package limits the number of chains to the number of cores the system has. The default choice of the number of chains is 4.

Iter: The total number of iterations per chain. Some datasets may require more iterations to converge - meaning it takes longer to reach the equilibrium where further iterations do not improve the model fitting. Therefore, a higher number will cause the model to take longer to fit; see a comparison below. The default number of iterations is 2,000.

Warmup: Also known as burn-in. This refers to the number of iterations at the beginning of an MCMC run to be thrown away. The warmup period is a hyper-parameter that is meant to give the Markov Chain time to reach its equilibrium distribution. The idea here is that a "bad" randomized starting point may over-sample regions that aren't representative of the equilibrium distribution. Defaults to floor(iter/2).

Thin: Thinning consists of picking values at each k-th step and removing them from the Markov chain. This helps speed up model convergence if number of iterations, iter, is large. Defaults to 1, meaning all values are kept. Choose a value > 1 to save memory and computation time if iter is large.

Cores: Number of CPU cores to utilize for building chans. As chains can be created independent of one another, parallelization can be utilized to speed up the process. You can see how many cores are available on your machine by using the detectCores() function from the package parallel.

Family: Defaults to Gaussian. Since we are building a linear mixed model, we will use the default choice. More information on specifying the family of distributions can be found in this manual's generalized mixed model section.

priors <- c(prior(normal(251, 10), class = Intercept), # intercept prior
            prior(normal(10, 10), class = b), # slope prior
            prior(normal(31, 10), class = sigma), # population sd
            prior(normal(37, 10), class = sd) # tau0, group sd
            )

model1 <- brm(Reaction ~ Days + (1 | Subject),
             data = sleepstudy,
             prior = priors,
             chains = 4,
             iter = 10000,
             # warmup = ,
             thin = 1, 
             cores = 4,
             family = gaussian()
             )

summary(model1)

Notes: * Within prior() calls, sigma refers to the population-level and sd refers to the group-level * The brms package adjustes prior distributions to be greater than 0 in sd and sigma priors. * To compare these results with the output from a model built with the lme4 package, please see the frequentist section of this manual above. * More iterations take longer to fit. If a model fails to converge, you will see a warning output and may need to adjust iterations or other fitting options in order to be successful. Experiment with iterations until you reach an acceptable output. * If you want to compare fitting times for different iterations, refer to the "iteration_comparison" section of accompanying code this markdown file.


Summary of the output

With our model built, we can now get estimates of the parameters. R's built-in summary() function explains each portion of the output, one item at a time.

summary(model1)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Reaction ~ Days + (1 | Subject) 
##    Data: sleepstudy (Number of observations: 180) 
##   Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup draws = 20000
## 
## Group-Level Effects: 
## ~Subject (Number of levels: 18) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)    42.55      6.20    31.54    55.86 1.00     5591     8862
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   227.13      8.51   210.01   243.31 1.00     4199     7736
## Days         10.46      0.81     8.87    12.08 1.00    25629    13763
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    31.24      1.73    28.11    34.84 1.00    19942    14829
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The first five rows of the summary output describe our model and model inputs defined in the brm call.

Group-Level Effects

If there is an interest in accessing a specific estimate from the model object, it can be obtained by using the $ operator on the summary object of the model. For example: summary(model1)$random$Subject returns Subject-specific (or group level) random effects:

summary(model1)$random$Subject
##               Estimate Est.Error l-95% CI u-95% CI     Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 42.55292  6.200867 31.53863 55.85676 1.000412 5591.303 8861.941

Moreover, the$ notation can be used in addition to the last command to access individual parts of the output.

Our estimate for the subject-specific random effect is 42.55, which differs from the estimate \(\widehat{\tau}_0\) = 37 under the classical method. However, the base model estimate is still within the Bayesian credible interval (31.5386345, 55.856759).

Rhat (\(\hat R\)) for a given parameter estimate measures whether the samples obtained are representative of the actual distribution. Mathematically, it is the square root of the estimated total variance over within-chain variance. Its formula is

\[\widehat R=\sqrt{\frac{\widehat {\rm var}^+ (\theta|{\rm Data})} {W}},\]

where

  • Total posterior variance of a given paramater, \({\widehat {\rm var}}^+(\theta |{\rm data}) = (N-1)W/N + B/N\)
  • The within-chain variance estimate \(W= (1/M)\sum_{m=1}^M s_m^2\), where \(s^2_m=\{1/(N-1)\}\sum_{n=1}^N (\theta^{(n,m)}-\overline{\theta}^{(\cdot, m)})^2\)
  • Between-chain variance estimate \(B=\{N/(M-1)\}\sum_{m=1}^M(\overline\theta^{(\cdot,m)}-\overline\theta^{(\cdot,\cdot)})^2\), where \(\overline\theta^{(\cdot,m)}=(1/N)\sum_{n=1}^N\theta^{(n,m)}\), \(\overline\theta^{(..)}=(1/M)\sum_{m=1}^M\bar\theta^{(.m)}\)
  • \(N\) is the total number of iterations (iters) per chain, \(M\) is total number of chains, \(S=MN\) is the total number of draws from all chains, \(\theta^{(nm)}\) is the \(n\)th draw of the \(m\)th chain, \(\theta^{(.m)}\) is the average of draws from the \(m\)th chain, and \(\theta^{(..)}\) is the grand mean (average) across \(M\) chains.

The STAN backend code uses split \(\hat R\) that takes \(M\) chains, splits them in half, then performs the \(\hat R\) calculation on the resulting \(2M\) chains. So if there is only one chain, \(\hat R\) can still be calculated (\(M=2\)).

If this value is larger than 1.1, it is an indication that the model has not converged adequately. This can be accessed for each parameter estimate within the model summary. For example, \(\hat R\) for subject specific random intercept, \(\hat R_{\hat\sigma_\tau}\) is accessed with summary(model1)$random$Subject$Rhat = 1.0004121.

Bulk_ESS or bulk effective sample size is a diagnostic for the sampling efficiency in the bulk of the posterior distribution. It measures the information content of a sample chain. It can be thought of as the minimum sample size from the posterior which has the same efficiency in estimating the posterior as a chain of MCMC samples. A higher number here is better - ESS > 400 is recommended (Vehtari et al., 2021). \(ESS = NM /\widehat\pi\), where \(N\) is the length of the chain, \(M\) is the total number of chains, and \(\widehat\pi\) represents estimated correlation between lagged samples in the chain.

Tail_ESS represents the minimum sample size needed to create 5% and 95% quantiles of the posterior.

\(\widehat R\) of 1 and \(ESS\) greater than 400 are required to ensure an MCMC sample is useful in practice. More information on \(\widehat R\), \(ESS\), and \(Tail ESS\) calculations can be found in Vehtari et al (2021).

Population-Level Effects

pop_effects <- summary(model1)$fixed

This section shows the estimate (posterior mean) of the intercept, slope, and the random effect terms.

Family Specific Parameters

fam_effects <- summary(model1)$spec_pars

This shows our Bayes estimate for \(\widehat\sigma_e\) = 31.24, while estimate from the classical method using the lme4 package is 30.99.


Posterior Summary

While the summary() function has a tidy appearance, the posterior_summary() function produces much more information about the estimates.

posterior_summary(model1)
##                             Estimate  Est.Error        Q2.5        Q97.5
## b_Intercept               227.132047  8.5110609  210.011897  243.3147620
## b_Days                     10.459994  0.8110058    8.868201   12.0808333
## sd_Subject__Intercept      42.552923  6.2008675   31.538635   55.8567590
## sigma                      31.243158  1.7305825   28.110510   34.8360205
## r_Subject[308,Intercept]   64.395375 12.4022506   40.149882   88.7474326
## r_Subject[309,Intercept]  -55.634287 11.8970336  -78.431389  -31.7925528
## r_Subject[310,Intercept]  -40.858222 11.8695490  -63.893391  -17.4527266
## r_Subject[330,Intercept]   27.464542 12.1917459    3.893000   51.5879079
## r_Subject[331,Intercept]   33.489801 12.1656300   10.053813   57.5066848
## r_Subject[332,Intercept]   31.422443 12.1080476    7.842747   55.6753750
## r_Subject[333,Intercept]   39.769985 12.2542521   15.863203   63.7315155
## r_Subject[334,Intercept]   20.021023 12.1596605   -3.523049   44.3472413
## r_Subject[335,Intercept]  -22.717244 11.9533152  -46.177643    0.7032051
## r_Subject[337,Intercept]   95.966007 12.4709017   71.690211  120.4976203
## r_Subject[349,Intercept]    1.615151 12.1256699  -21.799628   25.4673204
## r_Subject[350,Intercept]   37.414615 12.1055378   13.702174   61.0783374
## r_Subject[351,Intercept]   15.151710 12.0579344   -8.176647   39.2082310
## r_Subject[352,Intercept]   59.731059 12.3570132   35.987047   84.5744143
## r_Subject[369,Intercept]   30.183657 12.2144481    6.526597   54.6015443
## r_Subject[370,Intercept]   16.620757 12.0964969   -6.754476   40.6666686
## r_Subject[371,Intercept]   19.777051 12.0504798   -3.887480   43.2967901
## r_Subject[372,Intercept]   41.417989 12.2161905   17.894376   65.6737382
## lprior                    -16.240267  1.7290961  -20.273374  -13.6635974
## lp__                     -909.585952  4.2226882 -918.860830 -902.4809172

The first 4 rows are repeated from the summary, along with estimate of the quantiles of the posterior distribution. The next section of our model output displays \(c_{i}\), our intercept estimates for each subject. For example, subject 308 has the subject specific intercept of 64.35, so this could be interpreted as the subject having a mean reaction time of (64.35 + 227.04) = 291.39ms for day 0.

lprior and lp__ are used for automatic prior/likelihood sensitivity analysis using an additional package, priorsens and are not included in the brms documentation.


Plots

To obtain the trace and density plots for all model parameters, one may use plot(model):

plot(model1)

The left side of the plot shows the posterior density of the parameters. The right side of the plot shows trace plots of the MCMC samples. To obtain a plot of an individual parameter, pars= argument can be added to the plot function, i.e., plot(model1, pars=b_Intercept) returns only the density and trace plots of the intercept.


Plotting Model Estimates

As an alternative to the summary(model)$ function, the coef() function can be used to pull model coefficients. We can access intercept estimates for each subject, \(\widehat c_i\), to plot them along with the dataset. For more model plotting options in the brms package, see the mcmc_plot.brmsfit section of the package documentation.

# population level slop estimate
pop_slope <- coef(model1)[["Subject"]][,,'Days'][1,1]
m1_coefs <- coef(model1)[["Subject"]][,,'Intercept'][,1]

estimate_df <- data.frame(subject = factor(names(m1_coefs)),
                          slope = pop_slope,
                          intercept = m1_coefs)


# Plotting first 4 subjects
# Function to emulate first 4 default colors from ggplot package
gg_color_hue <- function(n) {
  hues = seq(15, 375, length = n + 1)
  hcl(h = hues, l = 65, c = 100)[1:n]
}

colors = gg_color_hue(4)

sleepstudy %>% 
  filter(Subject %in% c(308,309,310,330)) %>% # first 4 subjects
  ggplot(aes(x = Days, y = Reaction, color = Subject)) +
  geom_point() +
  # subject 308
  geom_abline(slope = estimate_df[1,]$slope, 
              intercept = estimate_df[1,]$intercept,
              color = colors[1]) +
  # subject 309
  geom_abline(slope = estimate_df[2,]$slope,
              intercept = estimate_df[2,]$intercept,
              color = colors[2]) +
  # subject 310
  geom_abline(slope = estimate_df[3,]$slope,
              intercept = estimate_df[3,]$intercept,
              color = colors[3]) +
  # subject 330
  geom_abline(slope = estimate_df[4,]$slope,
              intercept = estimate_df[4,]$intercept,
              color = colors[4])

The posterior distribution of each parameter can be assessed by using the as_draws() function on the brms model object. By default, this function will return all variables from all chains, excluding estimates from warmup/burn-in draws. However, one may pull a single variable or list of variables by adding the variables=c(), pull values for a specific chain using the $ operator, and include inc_warmup = TRUE to the function call to include warmup draws.

Ex: as_draws(model1, variables=c('b_Intercept', 'sigma'), inc_warmup=TRUE)$3 will return b_Intercept and sigma estimates for all 10,000 draws on chain #3.


Model 2 - Correlated random effects

Our second model has a random intercept term and a random slope, with correlated random effect. \(Y_{i, j} = \beta_0 + \beta_1 j + c_{i, 0} + c_{i,1}j + e_{i, j}\), where \[\left(\begin{array}{c} c_{i,0}\\ c_{i,1}\\ \end{array}\right) \sim N_2(0,\Sigma),\, \, \, \Sigma=\left[\begin{array}{c} \sigma^2_{c_{0}}& \sigma_{(c_{0},c_{1})}\\ \sigma_{(c_{0},c_{1})} & \sigma^2_{c_{1}}\\ \end{array}\right].\] We will use the classical estimates to construct the prior distribution for the parameters for the brm model.

base_model2 <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
summary(base_model2)

b0 <- summary(base_model2)$coefficients[1,1]
b1 <- summary(base_model2)$coefficients[2,1]
se <- summary(base_model2)$sigma
te <- attr(summary(base_model2)$varcor$Subject, "stddev")[1] # subject specific intercept 
be <- attr(summary(base_model2)$varcor$Subject, "stddev")[2] # random slope sd 

cor_int_slope <- attr(summary(base_model2)$varcor$Subject, "correlation")[1,2]

priors <- c(prior(normal(251, 10), class = Intercept), # intercept prior
            prior(normal(10, 10), class = b), # slope prior
            prior(normal(26, 10), class = sigma), # population variance
            prior(normal(24, 10), class = sd, coef=Intercept, group=Subject), # tau0, group intercept variance
            prior(normal(6, 10), class = sd, coef=Days, group=Subject) # group slope variance
)

model2c <- brm(Reaction ~ Days + (Days | Subject),
               data = sleepstudy,
               prior = priors,
               chains = 4,
               iter = 10000,
               thin = 1, 
               cores = 4,
               family = gaussian()
)

One can access model estimates, estimated error, 2.5% and 97.5% posterior quantiles by using the posterior_summary() function on the model object.

posterior_summary(model2c)
##                                  Estimate  Est.Error         Q2.5        Q97.5
## b_Intercept                   239.5607648  7.3443303  224.4015172  253.5761191
## b_Days                          7.3013486  1.7891146    3.5467233   10.6299991
## sd_Subject__Intercept          27.7249643  5.7195661   17.5631890   40.1064485
## sd_Subject__Days                7.3465065  1.8026412    4.4983531   11.5561245
## cor_Subject__Intercept__Days    0.2239903  0.2826236   -0.3334059    0.7478453
## sigma                          25.9201189  1.5185897   23.1139760   29.0778671
## r_Subject[308,Intercept]       13.8246079 14.6984188  -15.3319572   42.6769041
## r_Subject[309,Intercept]      -28.9625994 13.0372529  -54.9546425   -3.7053278
## r_Subject[310,Intercept]      -27.3010330 13.2705279  -53.9655347   -2.2403654
## r_Subject[330,Intercept]       33.7822932 14.2861197    7.1890861   62.8904409
## r_Subject[331,Intercept]       32.6582905 14.0934923    6.1953969   61.6438174
## r_Subject[332,Intercept]       19.9133756 13.5612735   -6.3270864   47.0218636
## r_Subject[333,Intercept]       27.4569733 13.7802523    1.2437690   55.6554287
## r_Subject[334,Intercept]        4.0417649 13.4261385  -22.5525817   30.5926318
## r_Subject[335,Intercept]        9.9921342 13.9258859  -16.0510267   38.2613812
## r_Subject[337,Intercept]       45.7583842 14.7719141   17.5688990   75.5346437
## r_Subject[349,Intercept]      -13.5352381 13.5892124  -40.8189626   12.3748351
## r_Subject[350,Intercept]       -1.2411703 14.4331703  -29.7988938   26.8066536
## r_Subject[351,Intercept]       15.3989132 13.4690157  -10.5613340   42.7127111
## r_Subject[352,Intercept]       31.6689847 14.1766660    4.0158802   59.5152923
## r_Subject[369,Intercept]       14.2388563 13.3631077  -12.0502410   40.8895418
## r_Subject[370,Intercept]      -13.6668044 14.2633426  -42.5033399   13.5192226
## r_Subject[371,Intercept]       11.8208495 13.1331117  -13.5529737   37.9264364
## r_Subject[372,Intercept]       22.9568081 13.6634306   -3.4058068   49.8612104
## r_Subject[308,Days]            12.3766806  3.0839454    6.5987371   18.6609842
## r_Subject[309,Days]            -5.6813142  2.6681447  -10.9673590   -0.4352616
## r_Subject[310,Days]            -2.5050524  2.7178646   -7.7546553    2.9213108
## r_Subject[330,Days]            -1.4960351  2.9018718   -7.2169346    4.1306808
## r_Subject[331,Days]             0.2228482  2.8744427   -5.4906488    5.8237612
## r_Subject[332,Days]             2.9530966  2.8089700   -2.5425036    8.5208325
## r_Subject[333,Days]             3.0529600  2.8472955   -2.4640447    8.6330808
## r_Subject[334,Days]             4.2274717  2.7811316   -1.1019243    9.8666637
## r_Subject[335,Days]            -7.5733581  2.8421501  -13.2736232   -2.1700489
## r_Subject[337,Days]            11.9832522  3.0429006    6.1329498   18.0954944
## r_Subject[349,Days]             4.2130526  2.8305795   -1.0927309    9.9016530
## r_Subject[350,Days]             9.7066950  2.9843206    4.0663806   15.7526486
## r_Subject[351,Days]             0.1725279  2.7740133   -5.2739787    5.6606491
## r_Subject[352,Days]             6.8249027  2.9178673    1.1732697   12.6712458
## r_Subject[369,Days]             4.0682147  2.8222456   -1.2971577    9.8577461
## r_Subject[370,Days]             7.8396814  2.9590715    2.2575276   13.8623517
## r_Subject[371,Days]             2.1731680  2.7297670   -3.1163002    7.6335443
## r_Subject[372,Days]             4.5430180  2.8630715   -0.9923862   10.2051266
## lprior                        -19.4446417  1.7857232  -23.7314963  -16.9626306
## lp__                         -902.4457826  6.6286392 -916.7557401 -890.7132262

The first two row of the above output correspond to b_Intercept and b_Days while the next two rows correspond to the standard deviation of the subject-level intercept, \(\sigma_{c_0}\) and that of slope, \(\sigma_{c_1}\).

In the next row, cor_Subject__Intercept__Days is the estimated correlation between the two subject-specific random effects, which is 0.22 with the 95% credible interval (-0.333, 0.748). This indicates a weak or no significant correlation. sigma in the output represents \(\widehat\sigma_e\).

The next 36 rows show the estimates for random intercept and random slope for each subject, along with posterior standard deviation (Est.Error), and the 2.5th and the 97.5th percentiles of the posterior distribution. One can access the random effects of the model using the $ notation: summary(model2c)$random

Data and model estimates can be accessed and plotted by first pulling the subject specific random slopes and intercepts using coef() and adding them to a plot with the abline() function.

# population level slop estimate
pop_slope <- coef(model2c)[["Subject"]][,,'Days'][,1]
m2c_coefs <- coef(model2c)[["Subject"]][,,'Intercept'][,1]

estimate_df <- data.frame(subject = factor(names(m2c_coefs)),
                          slope = pop_slope,
                          intercept = m2c_coefs)


# Plotting first 4 subjects
# Function to emulate first 4 default colors from ggplot package
gg_color_hue <- function(n) {
  hues = seq(15, 375, length = n + 1)
  hcl(h = hues, l = 65, c = 100)[1:n]
}

colors = gg_color_hue(4)

sleepstudy %>% 
  filter(Subject %in% c(308,309,310,330)) %>% # first 4 subjects
  ggplot(aes(x = Days, y = Reaction, color = Subject)) +
  geom_point() +
  # subject 308
  geom_abline(slope = estimate_df[1,]$slope, 
              intercept = estimate_df[1,]$intercept,
              color = colors[1]) +
  # subject 309
  geom_abline(slope = estimate_df[2,]$slope,
              intercept = estimate_df[2,]$intercept,
              color = colors[2]) +
  # subject 310
  geom_abline(slope = estimate_df[3,]$slope,
              intercept = estimate_df[3,]$intercept,
              color = colors[3]) +
  # subject 330
  geom_abline(slope = estimate_df[4,]$slope,
              intercept = estimate_df[4,]$intercept,
              color = colors[4]) +
  ggtitle("Sleep Study Model 2 - Correlated")

Model 3 - Uncorrelated random effects

If you suspect the random effects are uncorrelated, it is possible to account for this in the model call. We will build the model with random effects Days and Subject uncorrelated. This assumes that the subject specific random effects are. This is denoted by (Days || Subject) in the lmer and brm calls. As before, we will build a base model using lmer for constructing the prior information.

base_model3 <- lmer(Reaction ~ Days + (Days || Subject),data = sleepstudy)

b0 <- summary(base_model3)$coefficients[1,1]
b1 <- summary(base_model3)$coefficients[1,2]
se <- summary(base_model3)$sigma
te <- attr(summary(base_model3)$varcor$Subject, "stddev")[1] # subject specific intercept 
be <- attr(summary(base_model3)$varcor$Subject, "stddev")[2]


model2u <- brm(Reaction ~ Days + (Days || Subject),
               data = sleepstudy,
               prior = priors,
               chains = 4,
               iter = 10000,
               thin = 1, 
               cores = 4,
               family = gaussian()
)

# posterior_summary(model2u)
summary(model2u)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Reaction ~ Days + (Days || Subject) 
##    Data: sleepstudy (Number of observations: 180) 
##   Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup draws = 20000
## 
## Group-Level Effects: 
## ~Subject (Number of levels: 18) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)    27.78      5.58    18.07    39.79 1.00    11637    13697
## sd(Days)          7.38      1.78     4.63    11.50 1.00     6846    11148
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   240.58      7.59   224.69   254.55 1.00     9121    13042
## Days          7.44      1.88     3.37    10.81 1.00     5834     9347
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    25.79      1.51    23.04    28.97 1.00    25263    16288
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The output of the summary function is similar to that of the correlated model, with the exception of showing the correlation between the random intercept and days as we are not allowing for correlation with this model. You can use the posterior_summary() function to see estimates and credible intervals for each parameter.


Model Comparison

There is an additional function add_criterion() that allows a model fit criteria to be added to a model object. There is no criteria added to the model build function by default. Instead, a separate call is needed after the model is built.

Criteria choices include:

loo: Leave one out cross-validation splits the data into training and testing sets, with the testing set consisting of only one observation. During each iteration, the model is trained and a prediction is made on the left out observation. (Bürkner, 2017)

kfold: K-fold cross validation refits the model K times, leaving one/Kth of the original data out. If K is set to the total number of observations in the data, then K-fold is identical to loo. (Bürkner, 2017)

waic: Widely Applicable Information Criterion is a generalization of the Akaike Information Criterion (AIC). It differs in that AIC is calculated using a point estimate, usually the maximum likelihood estimate, whereas WAIC is averaged over the posterior distribution. (Vehtari et al, 2013)

loo_subsample: An approximation of loo using subsampling for efficiency - especially useful for large datasets. (Magnusson, et al. 2019)

bayes_r2: Bayesian \(r^2\) solves the problem with typical \(r^2\) potentially having a numerator that is larger than the denominator. Classical \(R^2 = \frac{V_{n=1}^N \hat y_n } {V_{n=1}^Ny_n}\) - or fitted variance over total variance. With strong prior information and weak observed data it is possible for the fitted variance to be larger than total variance. Bayesian \(R_s^2=\frac {V_{n=1}^N y_n^{pred s}}{V_{n=1}^N y_n^{pred s} + var_{res}^s}\), or explained variance over explained variance plus residual variance. For a linear model, \(y_n^{pred s}\) is simply the linear predictor (\(X_n\beta\)). For a generalized linear model, \(y_n^{pred s}\) is the transformed linear predictor, with transformation dependent on which family is used. (Vehtari, et al. 2018)

marglik: This criterion uses log marginal likelihood for model comparison. This is also referred to as the model evidence and uses the Bayes factor to determine which model is best. This is computationally expensive due to the additional posterior draws needed to compute the likelihood. To use this criterion, you must set save_pars = save_pars(all=TRUE) in the original brm function call to save all parameters. The brms package implements bridge sampling to calculate the marginal likelihood. (Bürkner, 2017)

Here, we will create a function that accepts a model object as an argument and will return the model with several criteria added.

# Function to add comparison criteria to a model
update_model <- function(model){
  model_crit <- add_criterion(model, c("waic", "loo", "bayes_R2"))
  model_crit$comparison <- data.frame(
    model_name = deparse(substitute(model)),
    waic = model_crit$criteria$waic$waic,
    loo = model_crit$criteria$loo$loo,
    bayes_R2 = mean(model_crit$criteria$bayes_R2)
  )
  return(model_crit)
}

# Add critera to each model
model1 <- update_model(model1)
model2c <- update_model(model2c)
model2u <- update_model(model2u)

# Build data frame with model criteria 
df <- rbind(model1$comparison, 
            model2c$comparison,
            model2u$comparison)

df
##   model_name     waic      loo  bayes_R2
## 1     model1 1769.306 1769.921 0.6995344
## 2    model2c 1719.266 1722.836 0.7934719
## 3    model2u 1718.698 1721.943 0.7956496
library(knitr)
#knitr::
#kable(df)

For both waic and loo we are looking for the model with the lowest number. In each case, waic is lower than loo and model 2 - uncorrelated random effects performs the best. For bayes r2 we are looking for the highest number which would represent the most variance explained by the model. This criterion also chooses model 2 - uncorrelated as the best performing model. Ultimately, one should check and compare several such models and model choice criteria to make an informed decision on which model to proceed for any given problem.


Generalized linear mixed effects Model

We will utilize the brms package to approach a nonlinear problem with a Bayesian mixed model.

Introduction to the dataset

The lme4 package offers another dataset, "Contagious Bovine Pleuropneumonia" (CBPP) that is accessible using data(cbpp). This dataset describes the serological incidence of CBPP in zebu cattle during a survey implemented in 15 commercial herds in Ethiopia. The data contains 4 columns: 1. herd - a factor identifying the herd; 2. incidence - the number of new serological cases for a given herd and time period; 3. size - the number describing the size of the herd at the beginning of a given time period; 4. period - a factors with levels 1 to 4. The goal is modelling the incidence (health incidence) probability in terms of herd and period.

head(cbpp)
##   herd incidence size period
## 1    1         2   14      1
## 2    1         3   12      2
## 3    1         4    9      3
## 4    1         0    5      4
## 5    2         3   22      1
## 6    2         1   18      2

First we'll plot the data points, colored by herd:

It is a little difficult to see which points belong to which herd. We can add a line to make it a little more clearer:

cbpp %>% 
  ggplot(aes(x = period, y = (size-incidence)/size, group=herd)) + 
  geom_point(aes(col=herd)) +
  geom_line(aes(col=herd)) +
  ylab("Cattle Without Disease (%)") + 
  xlab("Period")


Model

We will fit a generalized linear mixed model with family set to "binomial". This model will have a random herd specific intercept and random period specific intercept. As before, we will build a model using the lmer package first to obtain priors.

\[\mbox{ln}\left\{\frac{Pr(\mbox{Healthy}=1)}{Pr(\mbox{Healthy}=0)}\right\}=\beta_0 + \beta_h + \beta_p,\] where \(\beta_0\) is the population intercept, \(\beta_h\) is the random herd-level intercept for herd \(h\), \(\beta_p\) is the random period-level intercept for period \(p\). The herd level random effects are assumed to come from an iid normal distribution with mean zero and some shared, herd-level variance, \(\beta_h \sim N(0,\sigma^2_h)\) , \(h \in {1,...,15}\). Period-level random intercepts are assumed to be iid normal with mean zero and shared period-level variance, \(\beta_p \sim N(0,\sigma^2_p)\), \(p\in {1,2,3,4}\).

Priors

There is a useful function in the brms package that gives information on all parameters that may have priors assigned, including the default priors that the package will use if unspecified.

Calling the get_prior() function with the formula, family, and data specified shows this information:

# Get priors for our model
get_prior(cbind(incidence, size-incidence) ~ (1 | period) + (1 | herd), 
      family = binomial(link="logit"),
      data = cbpp)
##                 prior     class      coef  group resp dpar nlpar lb ub
##  student_t(3, 0, 2.5) Intercept                                       
##  student_t(3, 0, 2.5)        sd                                   0   
##  student_t(3, 0, 2.5)        sd             herd                  0   
##  student_t(3, 0, 2.5)        sd Intercept   herd                  0   
##  student_t(3, 0, 2.5)        sd           period                  0   
##  student_t(3, 0, 2.5)        sd Intercept period                  0   
##        source
##       default
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)

Fitting a Model

As before, we use the estimates from the classical analysis using the lme4 package to figure out the priors for a brm model.

nlin_base_model <- glmer(cbind(incidence, size-incidence) ~ (1 | period) + (1 | herd), 
      family = binomial(link="logit"),
      data = cbpp)

s <- summary(nlin_base_model)

coef(nlin_base_model)

TODO: Explain LMER model and how to obtain priors...

There is a slight change in notation in the brm call here since we are using a generalized linear mixed effects model with the binomial family. The left side of the model reads size-incidence | trials(size), which notes the number of successes on the left (herd size - incidence) and the size of each herd inside of trials() on the right.

priors <- c(prior(normal(-2.3, 10), class = Intercept), # pop slope
            prior(normal(0, 1), class = sd), # residual error 
            prior(normal(0, 0.71), class = sd, group = herd, coef = Intercept), # herd-level intercept sd
            prior(normal(0, 0.54), class = sd, group = period, coef = Intercept)) # period-level intercept sd

nonlin_bayes <- brm(size-incidence | trials(size) ~ (1 | period) + (1 | herd), 
      family = binomial(link="logit"),
      data = cbpp,
      prior = priors,
      chains = 4,
      iter = 10000,
      thin = 1,
      cores = 4
    )
## Compiling Stan program...
## Start sampling

Model summary:

##  Family: binomial 
##   Links: mu = logit 
## Formula: size - incidence | trials(size) ~ (1 | period) + (1 | herd) 
##    Data: cbpp (Number of observations: 56) 
##   Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup draws = 20000
## 
## Group-Level Effects: 
## ~herd (Number of levels: 15) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.72      0.19     0.39     1.14 1.00     7363    11147
## 
## ~period (Number of levels: 4) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.63      0.23     0.28     1.18 1.00     9694    12761
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     2.28      0.42     1.49     3.14 1.00     6653     8564
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior summary:

##                            Estimate Est.Error          Q2.5        Q97.5
## b_Intercept              2.27957223 0.4178845    1.49258837    3.1424604
## sd_herd__Intercept       0.71512718 0.1929422    0.38703292    1.1382284
## sd_period__Intercept     0.63297275 0.2329689    0.27945501    1.1762494
## r_herd[1,Intercept]     -0.58242152 0.4063733   -1.39344655    0.2037430
## r_herd[2,Intercept]      0.32725075 0.4101696   -0.44584835    1.1676858
## r_herd[3,Intercept]     -0.37510695 0.3619084   -1.08191298    0.3396342
## r_herd[4,Intercept]      0.00480476 0.4573402   -0.87600084    0.9286435
## r_herd[5,Intercept]      0.22884232 0.3988658   -0.53622945    1.0476199
## r_herd[6,Intercept]      0.47217614 0.4295647   -0.31412619    1.3731280
## r_herd[7,Intercept]     -0.89245793 0.3925574   -1.68916376   -0.1442033
## r_herd[8,Intercept]     -0.65430451 0.4027218   -1.46575679    0.1086774
## r_herd[9,Intercept]      0.30614757 0.5099247   -0.63495739    1.3786286
## r_herd[10,Intercept]     0.61043971 0.4266187   -0.16704633    1.5059330
## r_herd[11,Intercept]     0.12613279 0.3676035   -0.59338019    0.8628040
## r_herd[12,Intercept]     0.10868315 0.4794709   -0.79375938    1.1027455
## r_herd[13,Intercept]     0.77849663 0.4569669   -0.03487779    1.7503533
## r_herd[14,Intercept]    -1.01532988 0.4407256   -1.92053672   -0.1877496
## r_herd[15,Intercept]     0.60810221 0.4587012   -0.22315351    1.5684108
## r_period[1,Intercept]   -0.80003273 0.3935593   -1.62821679   -0.0839223
## r_period[2,Intercept]    0.10064658 0.3988218   -0.73069515    0.8853567
## r_period[3,Intercept]    0.20919100 0.4016436   -0.60672948    1.0119745
## r_period[4,Intercept]    0.50426822 0.4350174   -0.28905147    1.4317063
## lprior                  -4.14438682 0.6673631   -5.83582125   -3.2934867
## lp__                  -117.55461792 4.2924870 -127.08076189 -110.3135065

Prediction

There are two ways to make predictions from a brm model. The first way is to use the generic predict() function that is built in. This function returns an estimate using the fitted parameters from the model including the estimated error and quantiles.

If the interest is in predicting the number of healthy cattle from herd #2 in period 4 with a total size of 15, we can use the following code:

predict(nonlin_bayes, newdata = data.frame(herd = 2, 
                                           period =4, 
                                           size =15))
##      Estimate Est.Error Q2.5 Q97.5
## [1,]  14.2849 0.8902087   12    15

The second way is to use the posterior_predict() function that is built in to the brms package. This will return an S x N array of values where S is the total number of posterior draws from our model (total number of chains x number of post burn-in draws = 4 x 5,000 in our example) and N is the number of observations being predicted.

The following output represents the predicted number of healthy (non-diseased) cattle.

preds <- posterior_predict(nonlin_bayes, newdata = data.frame(herd = 2, 
                                           period =4, 
                                           size =15))

head(preds)
##      [,1]
## [1,]   15
## [2,]   13
## [3,]   13
## [4,]   13
## [5,]   13
## [6,]   15

One may calculate the predictive probability as follows:
\[{\rm Pr}(\mbox{Healthy}=1)=\frac{\exp(\beta_{0, s}+\beta_{h,s}+\beta_{p, s})} {1+\exp(\beta_{0, s}+\beta_{h, s}+\beta_{p, s})},\] where \((\beta_{0, s}+\beta_{h, s}+\beta_{p, s})\) denotes the samples drawn from the posterior distribution of the parameters. A simple way to show a histogram of the predictive probabilities is to use the hist() function:

(preds / 15) %>% head()
##           [,1]
## [1,] 1.0000000
## [2,] 0.8666667
## [3,] 0.8666667
## [4,] 0.8666667
## [5,] 0.8666667
## [6,] 1.0000000
hist(preds/15, main="Predictive density", xlab="probability of healthy for herd 2 and period 4", prob=T)

In the Bayesian context, we obtain the predictive distribution of a future observation, which sheds light on the uncertainty of the prediction. To view model estimates for each parameter, one may use the as_draws() function as we did earlier. The code below provides \(\widehat\beta_0\) from all posterior samples after the burn-in period of chain 1:

as_draws(nonlin_bayes)$`1`$b_Intercept

Using this function, we can create an dataframe of estimates for population slope \(\widehat\beta_0\), herd specific random slope \(\widehat\beta_h\), and period specific random slope \(\widehat\beta_p\) for our earlier example of cattle in herd number 2 and period 4:

chain1 <- as_draws(nonlin_bayes)$`1`
chain2 <- as_draws(nonlin_bayes)$`2`
chain3 <- as_draws(nonlin_bayes)$`3`
chain4 <- as_draws(nonlin_bayes)$`4`

beta_hat0 <- c(chain1$b_Intercept, 
            chain2$b_Intercept, 
            chain3$b_Intercept, 
            chain4$b_Intercept)
            
beta_hath2 <- c(chain1$`r_herd[2,Intercept]`, 
             chain2$`r_herd[2,Intercept]`,
             chain3$`r_herd[2,Intercept]`,
             chain4$`r_herd[2,Intercept]`)

beta_hatp4 <- c(chain1$`r_period[4,Intercept]`, 
             chain2$`r_period[4,Intercept]`,
             chain3$`r_period[4,Intercept]`,
             chain4$`r_period[4,Intercept]`)

beta_hat_df<- data.frame('beta_hat_0' = beta_hat0,
           'beta_hat_h2' = beta_hath2,
           'beta_hat_p4' = beta_hatp4)

kable(head(beta_hat_df))
beta_hat_0 beta_hat_h2 beta_hat_p4
1.926996 0.5357044 0.6697856
2.260085 0.1031165 0.4283443
2.001483 0.3337946 0.5822312
1.683224 0.4592128 1.4044626
1.936809 0.2113180 0.4486134
2.222582 0.2694498 1.1179977

References

https://cran.microsoft.com/snapshot/2021-10-10/web/packages/lme4/vignettes/lmer.pdf
https://www.lcampanelli.org/mixed-effects-modeling-lme4/
https://www.rensvandeschoot.com/tutorials/lme4/
https://vitalflux.com/fixed-vs-random-vs-mixed-effects-models-examples/
https://www.rdocumentation.org/packages/lme4/versions/1.1-29/topics/lmer
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5970551/

Belenky et al. (2003). [[Sleepstudy data]]

Bürkner, (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01

Magnusson et al. (2019). Bayesian leave-one-out cross-validation for large data (2019) (http://proceedings.mlr.press/v97/magnusson19a/magnusson19a.pdf)

Vehtari et al (2013). Understanding predictive information criteria for Bayesian models. http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf

Vehtari et al. (2018). R-squared for Bayesian regression models (http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf)

Vehtari et al. (2019). Bayesian R2 and LOO-R2 (https://avehtari.github.io/bayes_R2/bayes_R2.html)

Vehtari et al. (2021). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC (with discussion). Bayesian Data Analysis. (https://projecteuclid.org/journals/bayesian-analysis/advance-publication/Rank-Normalization-Folding-and-Localization--An-Improved-R%CB%86-for/10.1214/20-BA1221.full)