Understanding Generalized Linear Mixed Models (GLMMs)
Generalized Linear Mixed Models (GLMMs) are powerful statistical tools that extend generalized linear models (GLMs) by incorporating both fixed and random effects. This makes them particularly useful for analyzing data with complex, hierarchical, or nested structures. In this blog post, we will explore the fundamentals of GLMMs, highlight their applications, and provide a detailed example of how to use GLMMs in medical research using R.
Key Concepts in GLMMs
- Fixed Effects: These effects represent the population-level impact and are consistent across all groups or clusters. For example, in a study examining the effect of a new drug on blood pressure, the drug’s effect would be a fixed effect.
- Random Effects: These capture group-level variations, accounting for the unobserved heterogeneity among different groups. For instance, in a multi-center clinical trial, differences between centers can be modeled as random effects.
- Hierarchical Structure: GLMMs are ideal for data organized at multiple levels, such as patients within hospitals within regions. Random effects are specified for each level of the hierarchy.
- Link Functions: GLMMs use link functions to relate the mean of the response variable to the linear predictor. Common link functions include the logit link for binary data and the log link for count data.
What the Model is Doing
In our example, we are studying the effect of a new drug on blood pressure over time, with measurements taken from patients across different hospitals. The model does the following:
- Fixed Effect of Time: The fixed effect of time in our model indicates the overall population trend of how blood pressure changes over time due to the drug. This effect is consistent across all patients and hospitals.
- Random Intercepts for Hospitals and Patients: The random intercepts allow each hospital and each patient within those hospitals to have their own baseline blood pressure level. This accounts for the unobserved heterogeneity among hospitals and patients, providing a more accurate and nuanced analysis. Specifically, the model is creating:
- Hospital-Specific Shifts: Each hospital has its own baseline shift from the overall population intercept. This captures site-specific variations in baseline blood pressure levels.
- Patient-Specific Shifts: Each patient also has a baseline shift from their hospital’s intercept. This captures individual patient variations within each hospital.
By including these random effects, the model maintains the foundation of a linear model with time being the only factor affecting the slope of blood pressure changes, while allowing for varying intercepts across hospitals and patients.
Applications of GLMMs
GLMMs are widely used in various fields, including:
- Medicine: For analyzing longitudinal data where repeated measurements are taken from the same patients.
- Education: To study student performance with data clustered by classrooms and schools.
- Ecology: To model species counts with data collected from different locations and times.
Detailed Example in Medical Research Using R
Let’s walk through an example of applying GLMMs in medical research. Suppose we have a dataset where we are studying the effect of a new drug on blood pressure over time, with measurements taken from patients across different hospitals.
Step 1: Generating a Sample Dataset
First, we’ll generate a sample dataset in R that meets the requirements to fit a GLMM.
set.seed(123)
# Sample data
n_patients <- 100
n_hospitals <- 10
n_obs <- n_patients * 5
# Generate random hospital IDs
hospital_id <- factor(rep(1:n_hospitals, each = n_obs/n_hospitals))
# Generate random patient IDs within each hospital
patient_id <- factor(rep(1:n_patients, each = 5))
# Generate random effects for hospitals and patients
hospital_effect <- rnorm(n_hospitals, 0, 2)
patient_effect <- rnorm(n_patients, 0, 1)
# Generate fixed effects
time <- rep(1:5, times = n_patients)
drug_effect <- 0.5 * time
# Combine to create blood pressure readings
bp <- 120 + hospital_effect[hospital_id] + patient_effect[patient_id] + drug_effect + rnorm(n_obs, 0, 5)
# Create dataframe
data <- data.frame(hospital_id, patient_id, time, bp)

Step 2: Fitting the GLMM
Next, we’ll fit the GLMM using the lme4 package in R. We’ll include random intercepts for hospitals and patients.
library(lme4)
# Fit the GLMM
model <- lmer(bp ~ time + (1 | hospital_id) + (1 | patient_id), data = data)
summary(model)

Step 3: Interpreting the Results
The summary of the model provides estimates for the fixed effects (time) and the variance components for the random effects (hospitals and patients). The fixed effect of time indicates how blood pressure changes over time due to the drug, while the random effects show the variability among hospitals and patients.
# Extracting fixed effects
fixed_effects <- fixef(model)
print(fixed_effects)
# Extracting random effects
random_effects <- ranef(model)
print(random_effects)
Step 4: Visualizing the Results
Finally, we can visualize the results to better understand the model’s predictions.
library(ggplot2)
# Plotting the predicted vs observed blood pressure
data$predicted_bp <- predict(model)
ggplot(data, aes(x = time, y = bp, group = patient_id)) +
geom_line(aes(color = hospital_id), alpha = 0.5) +
geom_line(aes(y = predicted_bp), color = "orange", linetype = "dashed") +
labs(title = "Observed vs Predicted Blood Pressure",
x = "Time",
y = "Blood Pressure")

Plot of observed (variable colors by hospital) vs. Predicted (orange dashed lines) Blood Pressure over time.

Commentary on the Model and Graphic
As you can see with the image, the model remains true to its simplistic linear model form, with only the variable ‘time’ providing an impact on the model slope; yet allowing for individualized shifts in the intercept for hospital site and patient id. This paints a more robust picture of the data. Rather than a standard linear regression model (which wouldn’t hold it’s assumption of independence between observations in this situation), we are able to account for more variability in the data. While this by no means (xD stats joke…) captures the bulk of the variability found within the dataset, it is one of the classes of models that can address the general trends of data with repeated measures (longitudinal data) correctly.
Conclusion
GLMMs are versatile tools for analyzing data with complex structures, making them invaluable in many fields, including medical research. By incorporating both fixed and random effects, GLMMs provide a nuanced understanding of the data, capturing both population-level impacts and group-level variations.
In this tutorial, we demonstrated how to generate a sample dataset, fit a GLMM using R, interpret the results, and visualize the model’s predictions. This approach can be extended to various types of hierarchical data, providing robust insights and more accurate inferences.
For further reading on GLMMs, check out these resources:
- Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
- Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
- McCulloch, C. E., Searle, S. R., & Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models. Wiley Series in Probability and Statistics.
Are you working on a complex dataset and need professional statistical consulting? Contact On Demand Stats at OnDemandStats.com to collaborate with experts who can provide customized solutions to meet your research and analytical needs.
References
- Generalized linear mixed model – Wikipedia
- Mixed model – Wikipedia
- Multilevel model – Wikipedia
- Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
- Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
- McCulloch, C. E., Searle, S. R., & Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models. Wiley Series in Probability and Statistics.

No responses yet