Multi Level Modelling in R: Core Syntax
Statistician turned Data Scientist with a Psychology background. I create clear, practical content that makes statistics easy to understand.
This post is part of a series of posts regarding Multi Level Modelling
Now that we know how an MLM model works - how do we code it out in practice? With the sample dataset that we previously used to illustrate the need for MLM - let’s learn the syntax to code up an MLM model in R.
First, let’s load in the libraries we need.
library(lme4) #stands for linear mixed effects
If you see the “there is no package called lme4” error, it just means you do not have the package yet - so install it first using this command:
install.packages(‘lme4’)
Side note: yes, I know the term “mixed effects” can be confusing. Sometimes it means a “within and between subject effect” , and now it means “fixed and random” effects. The two terminologies can overlap, but the meaning and focus aren’t exactly the same. MLM uses “fixed vs random” terminology - so let’s stick with that!
After the package is ready, let’s load in our data.
df = read.csv(r"(your_file_path)")
Taking a look to ensure all looks good:
head(df)
If you see the above, you’re good to go!
Just some basic dataset manipulation first - let’s get rid of all the capital letters in the column names by doing a simple assignment operation.
colnames(df) <- tolower(colnames(df))
Call the df again,
head(df)
Looking much better now!
Visualising The Dataset
Remember - descriptively, the data looks like that:
Visually, we can already see that this is quite a classic (too classic in fact) case of same slope, different intercept. So we know we want to add a random intercept component, but not a random slope for hours studied. This means that our desired equation should look like that:
Side note: I hope I was clear enough in the last post, but having a fixed effect component is NOT optional! Only with the random effect component you need to make a choice - removing the fixed effect means removing the entire predictor!
Coding an MLM Model in R
The good news is, the syntax is rather similar to a normal linear regression method - just that instead of an lm() function, we use the lmer() function. In general, the code looks like that:
model = lmer(DV ~ IV + (Random Terms | Hierarchy Variable), data = df)
In our case,
model = lmer(grade ~ hours_studied + (1 | school), data = df)
The 1 stands for the random intercept term. If you wanted to add both a random slope and a random intercept, that part would become (1 + hours_studied | school).
To see the model details, use the summary function as per normal linear regression.
summary(model)
You should get an output that looks like this:
Congrats! You’ve fitted your first mixed effects model!
Output Interpretation
Truth is, if you truly understand the mechanics of MLM from the previous post - reading the output should be relatively easy. Because everything boils back down to the equation - if you are clear about the unknown coefficients you need to find, then interpreting the output merely becomes a “matching” exercise.
The R output is rather intuitive - the fixed effects and random effects are presented separately:
You should realise that since this is a random intercept only model, we should NOT expect to see a random slope term. Indeed, we don’t. Instead, under random effect there is only 1 term under the “name” column - (Intercept).
One thing about random effects you should note: because they are by definition random, meaning that you aren’t interested in the specific groups THEMSELVES (but rather these groups are merely a subset of all the POSSIBLE groups out there) - what is presented under the summary tab is NOT A COEFFICIENT for the random effect (which varies by group) - but rather just a VARIANCE. This is the variance of the DEVIATIONS FROM THE FIXED EFFECT - because each group has a different error term (from the fixed effect), it is literally quantifying the spread of this error term.
Of course, it is possible to obtain the specific random effects for each of these groups. Just call ranef(model) - and all the random effects will be printed out for you.
Do note that these values are literally the DEVIATION FROM THE FIXED EFFECT (error) - NOT the coefficients for the group by itself! So to get a full equation for each group - you still have to manually combine the fixed effects & random effects to generate a group specific regression equation.
The fixed effects are even simpler - they are… well.. the fixed effects! Since they are true for all groups - it does not vary across groups - and thus there’s only 1 of each term for the entire model. And for these, you can interpret these as the coefficients in the regression equation directly - no tricks over here.
So, to summarize the “matching” game, all you need to do is to fill in the unknowns in the MLM equation with the output from R.
And there you have it! You’re done!
Extensions
But wait. What about significance? What should I do about the “model is nearly unidentifiable” warning? How does the syntax change as I add more predictors?
Interested to learn more? I’ll be creating a course (paid) on how to actually think in mixed models — not just run lmer(), but understand when to use random intercepts vs slopes, how to diagnose convergence issues, and how to interpret results without getting misled by default output.
It’ll be focused on the practical stuff people usually only learn after running into errors in real datasets: significance testing in mixed models, what those “nearly unidentifiable” warnings actually mean, and how to build models step-by-step without overfitting or overcomplicating things too early. Stay tuned!
And for now - let’s go through some more theory first - adding a Level 2 Predictor in MLM 😉
