Multi Level Modelling Basics - Equations, Random Intercepts & Random Slopes
Statistician turned Data Scientist with a Psychology background. I create clear, practical content that makes statistics easy to understand.
Note: this post is part of a series of posts regarding my comprehensive guide to Multi Level Modelling
At its core - the mechanics of Multi Level Modelling is simple. You need to have (1) hierarchy, and (2) an IV which affects the DV. As mentioned previously, the hierarchy in MLM is considered a NUISANCE variable - it is not the main IV which affects the DV. Rather, you conduct MLM because you suspect the DV is DIFFERENT for different groups in the hierarchy - whether in terms of the intercept, or in terms of the impact of the IV on the DV*.
*Note: this sounds suspiciously like an interaction effect - but it’s not, because (1) the groups are considered random effects now, and (2) MLM gives you way more flexible control about whether you want to add an intercept or a slope separately.
Equation of MLM
How does the equation of an MLM model look like? Let’s take a look:
Before you freak out and close this page, let me assure you that the logic is not as complex as all these random greek letters seem. Really, once you understand the notation, you will realise that it seems confusing only because the greek alphabets are foreign - not because the concept is hard.
A multi level model is often present in one of 2 ways - the level 1 & level 2 equations, or the combined equations. This corresponds to the blue and orange boxes in the picture over here.
These formats are actually equivalent - you will soon see that you can easily derive the combined equation from the Level 1 & Level 2 equations. So let’s focus on that first.
Obviously, the “levels” represent the hierarchy inside your dataset. Level 1 equation represents the lower level (students for instance), whereas level 2 represents the higher level which your data points are nested in (classrooms). To make this more concrete, let’s use the example that we previously used - examining the effect of study hours on exam performance, where each datapoint comes from a student, and students are nested in classrooms. Put into context, this means that the level 1 equation describes test scores at the “student level”, whereas the level 2 equation describes test scores at the “classroom level”.
Level 1 Equation - Just a Normal Regression Equation
Let’s first focus on the level 1 equation. It looks identical to a normal regression equation doesn’t it? You have the DV on the left hand side, and the predictor in the form of linear regression on the right. If you ignore the weird subscripts, this is literally exactly the same as a typical regression equation you would get if you IGNORE the hierarchy.
The only difference lies in the subscript of the terms - we now see that as per our 2 way ANOVA notation, there’s a i (representing the observation number within a group), and a j (representing the group number) attached to the terms. So if you were to use the multilevel model equation - you would use it in reference to the specific observation within the specific group.
See the magic? Now, your predicted test score is literally with reference to the SPECIFIC GROUP that the observation is nested within - and you are using the groups OWN INTERCEPT AND SLOPE to compute the predicted score. Neat right!
Side note: do note that B01 represents B0 (intercept) of GROUP 1. The double digits confused me a lot when I was learning MLM, learning the notation properly will save you a lot of debugging time!
So then the natural question becomes - what is the group 2 mean (intercept) and group 2 slope? That’s where the level 2 equation comes in.
Level 2 Equation - Describing how the Intercept & Slope Changes for Different Groups
You can see that the level 2 equation literally answers the question we had in the level equation - it provides you values to substitute into the level 1 equation in order to make a prediction. Breaking this down into the 2 components:
You can see that the level 2 equations literally just tell you how to obtain the group specific intercept & slope - by adding a random, group specific error from the grand means (fixed effects). Fixed effects - the general relationship from which the group varies (by the addition of their own group specific error terms) - are denoted with the gamma notation.
In some sense - this looks like a "nested" regression. The Level 2 equations currently have the form of "mean + error" - which is literally equivalent to an empty model in regression (withou predictors).
A good way to understand the Gamma Notation is this. The 1st subscript represents which Level 1 coefficient this Gamma is trying to describe. If it is 0, it is the fixed effect for the intercept (B0). If it is 1, it is the fixed effect for the 1st predictor (B1). If 2, it is the fixed effect for the 2nd predictor (B2).
The 2nd subscript of gamma represents which Level 2 predictor is being used to explain the level 1 term. If it is 0, it means that this gamma concerns the intercept of the level 2 equation. If it is 1, it means that this gamma concerns the 1st predictor of the level 2 equation (see level 2 predictors later, you won't understand this as of now).
Combined Equation
If this still feels slightly confusing to you - the combined equation will clear things up. By substituting the level 2 equations into the level 1 equation, you will get this foreign looking equation that makes people feel that MLM is an alien language (even though its not):
Or, for the more visual learners amongst you:
The only reason why people get confused is seriously because Beta gets replaced with Gamma - which makes this entire equation feel foreign. But in truth, it is not.
The entire mechanics of MLM can be seen most clearly from the combined equation. In truth, it still is very similar to a normal regression equation:
But instead of B0/B1 being identical for ALL groups - I add on a group specific error term to CHANGE B0/B1 from the FIXED B0/B1 (denoted by gamma). Which is how group specific lines are derived as an adjustment FROM the fixed effect - if you can visualise what I’m saying.
Of course, the fixed effect line isn’t typically plotted in the MLM plot. Rather, many group specific lines - derived as different deviations FROM the fixed effect line - are plotted. This gives you a graph looking something like that:
Voila! Your conceptual understanding of MLM is now complete!
Adding Random Slope and Random Intercept Separately
What I’ve covered above is the general form of MLM - it is of course entirely possible that I only want to add on a random slope, and not a random intercept, or a random intercept but not a random slope. You can do this easily by adjusting the level 2 equation. Let’s do the random intercept only model to illustrate (no random slope) since it is usually more common.
Looking at the equation - you can see that the intercept is determined by the fixed intercept and random error of intercept, whereas the slope is determined only by a fixed coefficient across all groups (does not vary by group). Hence, the MLM graph would look like that:
Done again! You can repeat the same process for a random slope only model - the core idea is the same. (just put the varying term as the slope, instead of the intercept).
MLM with >1 predictor
One of the biggest problems I had when learning MLM was that I implicitly had the assumption that each predictor should have 1 term in the regression equation. Because that’s what we’re taught initially after all! In a general regression equation, each term in the equation corresponds to one predictor (other than the intercept):
In MLM however - this is not true. From the general equation we covered earlier. we can already see that each predictor (including the intercept) can have one OR two terms inside the combined equation:
It still feels manageable for 1 predictor - but the equation gets longer and longer as you add more predictors. With just 2 predictors, I can get up for 7 terms in my MLM equation already:
Of course, 7 is the upper bound - assuming I add a random intercept, and allow the slopes to vary for every single predictor. I could posit that the the groups all have fixed intercept, and only the effect of IV2 on the DV varies by group (random slope for IV2 only) - whereas IV1 only has a fixed slope. The equation then becomes like that:
As such, you will realise the number of terms in the equation does not really bear much resemblance to the number of predictors you are considering. It is a modelling choice as to whether you want to include the random slope for your predictors. Keep that in mind as you go ahead!
Conclusion
What a whirlwind! This post breaks down the most fundamental logic of how MLM works - by adding a random slope/intercept to the fixed effect, we can generate varying slopes & intercepts for each specific group. We first considered the classic scenario where we only have 1 predictor (hours studied), and our data points (students) are nested within groups (classrooms).
Thereafter, we generalise the mechanics of MLM to >1 predictor - learning how the number of terms inside the MLM model can get pretty large, and that it does not have much resemblance to the number of predictors you are considering. To tell the number of predictors - you have to READ the equation, to see how many x there are! You can’t just use the number of terms as a proxy as we usually do in classical regression!
This is a good stopping point for you to consolidate your learning - this post is arguably the most important in the entire MLM series. Because once you get how the technique works fundamentally - applying it across different situations becomes trivial. So feel free to re-read this post, and make sure you understand what I’m talking about before moving on!
In the next post - we’ll proceed to do something more hands on - actually fitting an MLM model in R. We’ll learn the syntax for how to specify a random slope/intercept, and how we can code it out to get our results. Stay tuned!
