Mixed Effects Modeling


Preliminaries

  • Install these packages in R: {curl}, {ggplot2}, {lme4}, {AICcmodavg}, {MuMIn}

Objectives

In this module, we extend our discussion of regression modeling even further to include mixed effects models.

Mixed Effects Models

A final extension of linear regression modeling that we will talk about is so-called “multilevel” or “mixed effects” modeling. This is a very complex topic, and we will only scratch the surface! There are many varieties of mixed models.

  • Linear mixed models (LMM), if we’re dealing with normally distributed variables and error structures.
  • Generalized linear mixed models (GLMM), if we’re dealing with various other variable types and error structure (e.g., binary, proportion, or count data).
  • Nonlinear mixed models (NLMM), if we’re dealing with situations where our response variable is best modeled by a nonliner combination of predictors.

NOTE: We have not talked in this course about general or generalized NONLINEAR modeling, but it’s worth knowing that such approaches are also possible. NONLINEAR modeling is where our regression equation is a nonlinear function of the model parameters; for example, I’ve used NLMMs to model somatic growth in vervet monkeys.

In a (general or generalized) linear mixed model, we have a reponse variable, \(Y\), and observations that fall into different factor categories each with some set of levels (e.g., “sex” with levels “male” and “female”), and we are interested in the effects of these factors and factor levels on the response variable. Generically, if \(\mu\) = a population mean response and \(\mu_A\) = mean response for observations belonging to factor level A, then the effect of A is given by \(\mu\) - \(\mu_A\). We have already dealt with factors and factor levels in our linear regression models when we looked at categorical predictors (e.g., sex, rank category) in our discussions of ANOVA and ANCOVA.

We can conceptualize factor effects as being either fixed or random. Fixed factors are those that reflect all levels of interest in our study, while random effects are those that represent only a sample of the levels of interest. For example, if we include sex as a factor in a model with the factor levels “male” and “female”, this (typically) will cover the gamut of levels of interest our study, thus we would consider sex a fixed factor. When we were doing ANOVA and ANCOVA analyses previously, we were looking at the effects of such fixed factors.

However, if our observational data were to consist of repeated observations of the same sampling unit, e.g., measurements taken on the same set of individuals on different dates, individual ID would be considered a random factor because it is unlikely that we will have collected data from all possible “levels of interest”, i.e., from all possible individual subjects. We have not yet dealt with such random factors as an additional source of variance in our modeling.

Mixed models are those that include BOTH fixed and random effects. Including random effects in addition to fixed effects in our models has several ramifications:

  • Using random effects broadens the scope of inference. That is, we can use statistical methods to infer something about the population from which the levels of the random factor have been drawn.

  • Using random effects naturally incorporates dependence in the model and helps us account for pseudoreplication in our dataset. Observations that share the same level of the random effects are explicitly modeled as being correlated. This makes mixed effect modeling very useful for dealing with time series data, spatially correlated data, or situations where we have repeated observations/measures from the same subjects or sampling unit.

  • Using random factors often gives more accurate parameter estimates.

  • Incorporating random factors, however, does require the use of more sophisticated estimation and fitting methods.

We will explore mixed effects modeling using an example based on this excellent tutorial.

EXAMPLE:


Suppose we have measured the amount of grooming received by female chimpanzees when they are either in their periovulatory period (i.e., the window of 2-3 days around the likely time of ovulation) or during other portions of their reproductive cycle. We collected data on the duration of grooming bouts received and scored the female’s reproductive condition at the time as a categorical factor with two levels: “POP” versus “NONPOP”. On top of that, we also recorded data on female parity at the time of the grooming bout, i.e., whether the female had given birth previously (was “parous”, or “P”) or had not yet had an offspring (was “nulliparous”, or “N”).

If we’re interested in how reproductive condition and parity influence how much grooming a female receives, our regression model would look like this:

grooming duration ~ condition + parity + \(\epsilon\)

Also imagine that our study design was such that we took multiple observations per subject. That is, our data set includes records of multiple grooming bouts received by each subject. This situation violates the assumption of independence of observations that we make for standard linear regression: multiple responses/measures from the same subject cannot be regarded as independent from each other.

Using a mixed effects model, we can deal with this situation by adding subject ID as a random effect in our model. Doing so allows us to address the nonindependence issue by estimating a different set of parameters for each level of the factor subject. We can either estimate a different intercept for each subject (which would correspond to each female having a different “baseline” level of grooming received) or estimate a different intercept and slope (where individual subjects are presumeed to differ both in the baseline level of grooming received and the strength of the relationship between grooming duration, on the one hand, and reproductive condition and parity, on the other). Our mixed effects model estimates these individual level parameters in addition to the main effects of each variable.

This is why a mixed effects model is called a mixed model. The models that we have considered so far have been “fixed effects only” models and included only one or more “fixed” predictor variables and a general error term. We essentially divided the world into things that we somehow understand or that are systematic (the fixed effects, or the explanatory variables) and things that we could not control for or do not understand (\(\epsilon\)). These fixed effects models did not examine possible structure within the error term. In a mixed model, by contrast, we add one or more random effects to our fixed effects that may explain a portion of the variance in our error term.


CHALLENGE 1


Let’s explore these idea using some actual data. First, load in the dataset “chimpgrooming.csv” and do some exploratory data analysis:

library(curl)
## Using libcurl 7.64.1 with LibreSSL/2.8.3
f <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/AN588_Fall21/chimpgrooming.csv")
d <- read.csv(f, header = TRUE, sep = ",")
head(d)
##   subject parity season reprocondition duration
## 1    Nina      P     WS         NONPOP   213.56
## 2    Nina      P     WS            POP   205.81
## 3    Nina      P     WS         NONPOP   293.71
## 4    Nina      P     WS            POP   268.76
## 5    Nina      P     WS         NONPOP   205.02
## 6    Nina      P     WS            POP   287.19
summary(d)
##    subject             parity             season          reprocondition    
##  Length:84          Length:84          Length:84          Length:84         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     duration     
##  Min.   : 86.34  
##  1st Qu.:132.36  
##  Median :205.41  
##  Mean   :197.81  
##  3rd Qu.:252.27  
##  Max.   :316.23
# first, some exploratory visualization let's plot grooming received
# duration in relation to subject ID
par(mfrow = c(1, 1))
boxplot(data = d, duration ~ subject, col = c("lightpink1"))

# we see lots of individual variation let's plot grooming received
# duration in relation to reproductive condition
boxplot(data = d, duration ~ reprocondition, col = c("burlywood2", "lightpink1"))

# let's plot grooming received duration in relation to reproductive
# condition and parity
boxplot(data = d, duration ~ reprocondition * parity, col = c("burlywood2",
    "lightpink1"))

boxplot(data = d, duration ~ reprocondition * subject, col = c("burlywood2",
    "lightpink1"))

What patterns do you see?

Random Intercept Models

We will now perform an initial mixed effects analysis where we look at how reproductive condition and parity (as fixed effect) effect grooming duration, where we include individual subject ID as a random effect.

Here is a first mixed effects model that we will fit, using one extension of formula notation that is commonly used in R.

grooming duration ~ condition + parity + (1|subject) + \(\epsilon\)

Here, the 1 refers to the fact that we want to estimate an intercept and the pipe “|” operator following the “1” signifies that we want to estimate a different intercept for each subject. Note that this generic formula still contains a general error term, \(\epsilon\), to highlight that there will still be unexplained “error” variance after accounting for both fixed and random effects in the model.

We can think of this formula as saying that we expect our dataset to include multiple observations of the response variable per subject, and these responses will depend, in part, on each subject’s baseline level. This effectively accounts the nonindependence that stems from having multiple responses by the same subject.

The {lme4} package in R is commonly used for mixed effects modeling, and the function lmer() is the mixed model equivalent of the function lm(). In the formula syntax for mixed effects models using the {lme4} package, fixed effect are included without parentheses while random effects are included in parentheses (the error, \(\epsilon\), is understood and is not included explicitly).

NOTE: We could also use the package {nlme} for mixed effects modeling (which requires a slightly different formula syntax that that used here). Both packages also allow us to do GLMMs and nonlinear mixed effects modeling, which we will not be discussing. It is important to note that {lme4} uses, by default, a slightly different parameter estimation algorithm than {nlme}. Unless otherwise specified, {lme4} uses “restricted maximum likelihood” (REML) rather than ordinary maximum likelihood estimation, which is what is used in {nlme}. In practice, these give very similar results. We will see below that when we want to compare different models using {lme4}, we will need to tell {lme4} to use ordinary maximum likelihood.

The code block below implements this first mixed effects model.

library(lme4)
## Loading required package: Matrix
lme <- lmer(data = d, duration ~ reprocondition + parity + (1 | subject))
summary(lme)
## Linear mixed model fit by REML ['lmerMod']
## Formula: duration ~ reprocondition + parity + (1 | subject)
##    Data: d
## 
## REML criterion at convergence: 796.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2611 -0.5349 -0.1890  0.3918  3.1994 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 611.9    24.74   
##  Residual             847.3    29.11   
## Number of obs: 84, groups:  subject, 6
## 
## Fixed effects:
##                   Estimate Std. Error t value
## (Intercept)        132.841     15.305   8.680
## reproconditionPOP   20.293      6.352   3.195
## parityP            109.650     21.173   5.179
## 
## Correlation of Fixed Effects:
##             (Intr) rprPOP
## rprcndtnPOP -0.208       
## parityP     -0.692  0.000

Let’s focus on the output for the random effects first. Have a look at the column standard deviation. The entry for subject shows us how much variability in grooming duration (apart from that explained by the fixed effects) is due to subject ID. The entry for Residual summarizes the remaining variability in grooming duration that is not due to subject or to our fixed effects. This is our \(\epsilon\), the “random” deviations from the predicted values that are not due to either subject or our fixed effects.

The fixed effects output mirrors the coefficient tables that we have seen previously in our linear models that have focused only on fixed effects. The coefficient “reproconditionPOP” is the \(\beta\) coefficient (slope) for the categorical effect of reproductive condition. The positive for the coefficient means that grooming duration is GREATER by 20.293 units for POP than for NONPOP females. Then, there’s a standard error associated with this slope, and a t value, which is simply the estimate divided by the standard error.

The coefficient “parityP” is the \(\beta\) coefficient for the categorical effect of parity. The grooming duration associated with being parous versus nulliparous is GREATER by 109.65 units.

The INTERCEPT in this case is the grooming duration associated with nulliparous, NONPOP females. Like the lm() function, the lmer() took whatever factor level came first in the alphabet to be the reference level for each fixed effect variable.

Let’s also look at the coefficients coming out of the model.

coefficients(lme)
## $subject
##       (Intercept) reproconditionPOP parityP
## Luna    142.76558          20.29286  109.65
## Maya    137.37902          20.29286  109.65
## Nina    118.37755          20.29286  109.65
## Nipa    160.80050          20.29286  109.65
## Sofia    99.80117          20.29286  109.65
## Vita    137.92047          20.29286  109.65
## 
## attr(,"class")
## [1] "coef.mer"

We can see the separate intercepts, or “baseline” level of grooming received, associated with each female when they are (presumably) nulliparous and in a NONPOP reproductive condition. Note that not all females are, necessarily, ever seen in a particular parity or reproductive condition.

Statistical Significance in Mixed Effects Models

In mixed effects models, it is not as straightfoward to determine p values associated with either overall models or individual coefficients as it is for standard linear models. However, using likelihood ratio tests, which we previously used for comparing generalized linear models, is one common approach. Likelihood is the probability of seeing the data we have actually collected given a particular model. The logic of the likelihood ratio test is to compare the likelihood of two models with each other, i.e., a model that includes the factor that we are interested in versus a reduced, nested model with that factor excluded.

So… if we are interested in the effect of reproductive condition on grooming duration we could compare:

grooming duration ~ condition + parity + (1|subject) + \(\epsilon\)

grooming duration ~ parity + (1|subject) + \(\epsilon\)

In R, we would do this as follows:

full <- lmer(data = d, duration ~ reprocondition + parity + (1 | subject), REML = FALSE)
summary(full)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: duration ~ reprocondition + parity + (1 | subject)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    825.7    837.9   -407.9    815.7       79 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2996 -0.5283 -0.1783  0.4032  3.2285 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 388.6    19.71   
##  Residual             836.4    28.92   
## Number of obs: 84, groups:  subject, 6
## 
## Fixed effects:
##                   Estimate Std. Error t value
## (Intercept)        132.841     12.625  10.522
## reproconditionPOP   20.293      6.311   3.215
## parityP            109.650     17.288   6.343
## 
## Correlation of Fixed Effects:
##             (Intr) rprPOP
## rprcndtnPOP -0.250       
## parityP     -0.685  0.000
reduced <- lmer(data = d, duration ~ parity + (1 | subject), REML = FALSE)
summary(reduced)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: duration ~ parity + (1 | subject)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    833.4    843.2   -412.7    825.4       80 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4996 -0.6492 -0.1198  0.6594  2.7072 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 380.6    19.51   
##  Residual             947.3    30.78   
## Number of obs: 84, groups:  subject, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   142.99      12.22  11.697
## parityP       109.65      17.29   6.343
## 
## Correlation of Fixed Effects:
##         (Intr)
## parityP -0.707

NOTE: Here, we added the argument REML=FALSE to the lmer() function. This is necessary to do when we want to compare models using the likelihood ratio test. Basically, REML uses a different algorithm to determine likelihood values than ordinary likelihood, and, if we want to use these likelihoods to execute an LRT, we need to use ordinary likelihood. See this site for a more complete explanation of this issue.

We perform the likelihood ratio test using the anova() function.

anova(reduced, full, test = "Chisq")
## Data: d
## Models:
## reduced: duration ~ parity + (1 | subject)
## full: duration ~ reprocondition + parity + (1 | subject)
##         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)   
## reduced    4 833.43 843.15 -412.72   825.43                        
## full       5 825.72 837.88 -407.86   815.72 9.7089  1   0.001834 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results tell us that the model containing reproductive condition fits the data better than a null model lacking this variable. What about parity?

full <- lmer(data = d, duration ~ reprocondition + parity + (1 | subject), REML = FALSE)
reduced <- lmer(data = d, duration ~ reprocondition + (1 | subject), REML = FALSE)
anova(reduced, full, test = "Chisq")
## Data: d
## Models:
## reduced: duration ~ reprocondition + (1 | subject)
## full: duration ~ reprocondition + parity + (1 | subject)
##         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## reduced    4 835.97 845.70 -413.99   827.97                         
## full       5 825.72 837.88 -407.86   815.72 12.251  1   0.000465 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on this result, including parity also significantly improves the fit of our model.


CHALLENGE 2


Construct a model that includes an interaction of reproductive condition and parity and compare it to a model without the interaction term. Is the interaction of these two fixed effects significant?

full <- lmer(data = d, duration ~ reprocondition * parity + (1 | subject), REML = FALSE)
reduced <- lmer(data = d, duration ~ reprocondition + parity + (1 | subject),
    REML = FALSE)
anova(reduced, full, test = "Chisq")
## Data: d
## Models:
## reduced: duration ~ reprocondition + parity + (1 | subject)
## full: duration ~ reprocondition * parity + (1 | subject)
##         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## reduced    5 825.72 837.88 -407.86   815.72                     
## full       6 826.41 840.99 -407.20   814.41 1.3124  1      0.252

Random Slope Models

In the exercise above, we included only estimation of a separate intercept for each female and presumed that the same relationship between grooming duration and reproductive condition + parity obtained for all females. But we can also allow that relationship to vary across subjects. We would indicate this model in formula notation as follows:

lme <- lmer(data = d, duration ~ reprocondition + parity + (1 + reprocondition |
    subject) + (1 + parity | subject), REML = FALSE)
## boundary (singular) fit: see ?isSingular
summary(lme)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: duration ~ reprocondition + parity + (1 + reprocondition | subject) +  
##     (1 + parity | subject)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    833.6    857.9   -406.8    813.6       74 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4514 -0.6093 -0.1559  0.3851  3.2152 
## 
## Random effects:
##  Groups    Name              Variance Std.Dev. Corr 
##  subject   (Intercept)        84.020   9.166        
##            reproconditionPOP   1.469   1.212   -1.00
##  subject.1 (Intercept)       628.419  25.068        
##            parityP           627.945  25.059   -1.00
##  Residual                    835.801  28.910        
## Number of obs: 84, groups:  subject, 6
## 
## Fixed effects:
##                   Estimate Std. Error t value
## (Intercept)        132.530     16.293   8.134
## reproconditionPOP   20.293      6.328   3.207
## parityP            110.272     17.258   6.390
## 
## Correlation of Fixed Effects:
##             (Intr) rprPOP
## rprcndtnPOP -0.211       
## parityP     -0.902  0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

Here, we have changed the random effects, which now look a little more complicated. The notation “(1 + reprocondition|subject)” tells the model to estimate differing baseline levels of grooming duration (the intercept, represented by 1) as well as differing responses to the main factor in question, which is reproductive condition in this case. We do the same for parity.

Looking at the coefficients of the new model, we see the effects. Each female now has a different intercept and a different coefficient for the slopes of grooming duration as a function of both reproductive condition and parity.

coefficients(lme)
## $subject
##       (Intercept) reproconditionPOP   parityP
## Luna     144.8594          19.47768 110.25259
## Maya     137.6452          19.95465 110.26271
## Nina     113.0874          21.57830 110.30006
## Nipa     139.1961          19.85212  84.78083
## Sofia    125.3318          20.76876 140.06436
## Vita     135.0590          20.12564 105.97018
## 
## attr(,"class")
## [1] "coef.mer"

To then get p values associated with each of the fixed factors, we could use LRTs…

# reproductive condition
full <- lmer(data = d, duration ~ reprocondition + parity + (1 + reprocondition |
    subject) + (1 + parity | subject), REML = FALSE)
## boundary (singular) fit: see ?isSingular
reduced <- lmer(data = d, duration ~ parity + (1 + reprocondition | subject) +
    (1 + parity | subject), REML = FALSE)
## boundary (singular) fit: see ?isSingular
anova(reduced, full, test = "Chisq")
## Data: d
## Models:
## reduced: duration ~ parity + (1 + reprocondition | subject) + (1 + parity | subject)
## full: duration ~ reprocondition + parity + (1 + reprocondition | subject) + (1 + parity | subject)
##         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)   
## reduced    9 840.23 862.11 -411.11   822.23                        
## full      10 833.63 857.94 -406.82   813.63 8.5963  1   0.003368 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# parity
full <- lmer(data = d, duration ~ reprocondition + parity + (1 + reprocondition |
    subject) + (1 + parity | subject), REML = FALSE)
## boundary (singular) fit: see ?isSingular
null <- lmer(data = d, duration ~ reprocondition + (1 + reprocondition | subject) +
    (1 + parity | subject), REML = FALSE)
## boundary (singular) fit: see ?isSingular
anova(reduced, full, test = "Chisq")
## Data: d
## Models:
## reduced: duration ~ parity + (1 + reprocondition | subject) + (1 + parity | subject)
## full: duration ~ reprocondition + parity + (1 + reprocondition | subject) + (1 + parity | subject)
##         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)   
## reduced    9 840.23 862.11 -411.11   822.23                        
## full      10 833.63 857.94 -406.82   813.63 8.5963  1   0.003368 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that in both cases, we get a significant LRT, but we also get warnings that our null models “failed to converge”. Lack of convergence is sometimes due to having a LOT of parameters we are trying to estimate relative to the number of observations we have, though in this case the full model has more parameters than the reduced model. Dealing with lack of convergence in fitting maximum likelihood models is beyond what we can cover here, so I encourage you to explore that on your own :)

Determining Model Fit

For a long time, the appropriateness of our mixed-models was assessed as we did above: by determining the significance of each fixed effect using a LRT. Once information theoretic approaches became popular, we switched over to assessing model fit using AIC, but remember that AIC can only tell us relative fit, not whether it is a good fit, overall.

In 2013, Nakagawa & Schielzeth changed that by publishing a simple and effective method for getting an \(R^2\) value for generalized linear mixed models (recall, too, that linear mixed models are a specific type of GLMM, so this method can be used with LMM, as well). They point out that AIC has some critical limitations for expressing how well a model fits a dataset:
  • While AIC provides an estimate of the relative fit of various models, it does not say anything about the absolute fit
  • AIC cannot address the amount of variance explained by a model
  • AIC is not comparable across datasets, and so fit is not generalizeable
In their paper, Nakagawa & Schielzeth propose two different measures of ‘variance explained’ for mixed models:
  • Marginal (\(R^{2}_{GLMM(m)}\)): is variance explained on the latent (or link) scale rather than the original scale; we can interpret this as the variance explained by only the fixed effects.
  • Conditional (\(R^{2}_{GLMM(c)}\)): is variance explained by fixed and random factors; put another way, this is the variance explained by the entire model.

There is an easy way to assess \(R^{2}_{GLMM}\) in R using the r.squaredGLMM() function in the package {MuMIn}.


CHALLENGE 3


Compare the full, reduced, and null mixed models from our random slope exercise using an information theoretic approach. Is your best model the best fit (e.g., explain the most variance) for the dataset? How much more variance is explained by the random effects than the fixed effects alone?

library(AICcmodavg)
print(aictab(list(full, reduced, null), c("full", "reduced", "null")), LL = FALSE)
## 
## Model selection based on AICc:
## 
##          K   AICc Delta_AICc AICcWt Cum.Wt
## full    10 836.65       0.00   0.91   0.91
## null     9 842.51       5.87   0.05   0.96
## reduced  9 842.66       6.02   0.04   1.00
library(MuMIn)
r.squaredGLMM(full)
##            R2m       R2c
## [1,] 0.7221769 0.8102366
r.squaredGLMM(reduced)
##            R2m    R2c
## [1,] 0.6778132 0.7898
r.squaredGLMM(null)
##             R2m       R2c
## [1,] 0.01443171 0.8841458

Generalized Linear Mixed Modeling (GLMM)

Just as we extended our standard linear modeling approach to include non normally distributed response variables/error structures, so too can we extend our mixed effects modeling to such situations. This is referred to as generalized linear mixed modeling, or GLMM. There are several R packages we can use to do this (e.g., {MCMCglmm}, {lme4} using the glmer() call). The methods for generating maximum likelihood parameter estimates under GLMMs are more complicated, but conceptually, the process is a simple extension of what we have talked about already.