Level 2 Predictors in Multi Level Modelling
Statistician turned Data Scientist with a Psychology background. I create clear, practical content that makes statistics easy to understand.
Note: this post is part of a series of posts regarding my comprehensive guide to Multi Level Modelling
What are Level 2 Predictors? As the name implies - these factors vary at Level 2 - instead of Level 1 (which we are used to). Going back to our initial example of examining the relationship between hours studied and grades, a level 2 predictor would be something like “school funding”, or “school land mass”. Essentially - things that vary across the different hierarchical groups - but not the individual units of analysis themselves.
As per MLM fashion however - you have a lot of freedom. For a level 2 predictor - the first thing you will need to decide is where you want to add this predictor to - the intercept, the slope term, or both?
It’s hard to visualise this - so let's make it concrete by seeing the equation.
Side note: my original equation already starts with a random intercept & slope. This NEED NOT be the case - I could just as well have started off with a random intercept only model. But of course, if you don’t have random slopes in the first place, adding the level 2 predictor to the slope term would seem a little weird (saying that slope is the same, but then suddenly saying slope is different).
I know the above diagram looks intimidating - so let’s first get the intuition behind all of this. What does adding a Level 2 predictor actually mean? From the equation point of view - previously we were saying that we have a random intercept & random slope - emphasis on the random. Meaning that groups, inherently by themselves, have different baseline exam scores and relationship between hours studied and exam scores - for reasons that I cannot explain (random!)
But now, with level 2 predictors, I am saying that while the intercept and slope still varies by group, I can explain this variation by my level 2 predictor! (i.e. not random! Or at least, not completely random).
I may be adding this level 2 predictor because I want to test the hypothesis that it’s important (as how we usually treat predictors), or simply because we are trying to create the best model to understand the data as a whole. Both are valid reasons.
The Philosophy of Multi Level Modelling - No Longer Simplistic “Significance” Statements
A slight detour for this section: remember I started off this MLM series by saying that you should think of the hierarchy grouping as a nuisance variable? The fact that you can have Level 2 predictors (which vary at the group level) kind of negates that statement - because if grouping was just a nuisance variable, how can there be meaningful predictors at the group level?
Indeed - what I said earlier on was not entirely true. Of course - I could argue that because my main interest is in the Level 1 predictors - I am trying to best get rid of all possible nuisance variables at the group level. However, forcing down such a rigid mentality is harmful in the long run - and it is time to introduce you to the full philosophy of MLM, in all its messiness and glory. <gasp!>
The notion that the hierarchy is merely a nuisance variable is good for building the intuition behind how & why we need MLM (because if not we can just turn to Regression!) - but not always true as you get to more complex designs. While MLM can be used to simply account for the hierarchy (which, in this context, would be a nuisance variable), this need not always be the case.
More crucially, the goals of modelling change as you use more advanced techniques - oftentimes you are no longer as fixated on just the “significance of a predictor” as we typically care about in our t-tests. The focus shifts slightly to creating a model which “best explains the variance” - and this is slightly closer to saying that “this is a good way to understand the data” rather than “this predictor is significant".
This shift in purpose is subtle - and often not talked about much in statistical courses. When you get to this stage, you realise that “examining significance of predictors” simply isn’t as important as saying that “as a whole, the model fit is good”. Oftentimes, with more complex models, we can’t just isolate a single component of the model and say “this predictor is important, this predictor is not”. This is because the significance of any single variable depends on what other variables are inside the model (see dominance analysis for a simplistic explanation of this) - and complex models often have many variables inside. As a result, evaluating the importance of any single predictor when you have numerous predictors is simply too hard!
Complex statistical techniques lie a little closer to machine learning when it comes down to it - we create the model to understand the data, and oftentimes, because we hope to do something concrete with the conclusions of it (whether it’s the creation of a new framework, or straight out predictions). This is slightly different from the initial goal we learnt when we first delved into statistics - saying that “this predictor is significant!”
At the end of the day - there is no singular “goal” when it comes to statistics (MLM included). What you learn are just techniques - it’s actually up to you to be as creative as you want with it!
Adding Level 2 Predictor to Intercept
Okay. Enough theory talk. Let’s go back to something concrete - and show you exactly how things change when you add a Level 2 predictor. For scoping, let’s add it to the random intercept first.
Honestly, the equation doesn’t look that much different. By adding the level 2 predictor to the intercept, you are just saying that the random intercept is affected by varying levels of this level 2 predictor - and thus providing stakeholders more value as to why the groups vary in baseline levels of the DV in the first place.
Side Note: I previously said that the 2nd subscript of gamma represents which level 2 predictor is being used to explain the level 1 term - now you know what I mean!
Putting this into a worked example - let’s go back to the students’ hours studied & exam performance scenario we used in the first post of the series. Say we add in a new predictor of “school funding” (I’ve also changed the data a little more - the previous one was exaggerated to illustrate Simpson's paradox). You can obtain the new dataset here
We hypothesise that this predictor can explain the difference between student baseline performance across schools (random intercept). The graph would look like this:
Bummer. You wouldn’t even be able to tell that you added a level 2 predictor to the random intercept if you didn’t look at the equation! But in the R output - you get this additional row over here:
Unfortunately - that’s really all there is, In order to clearly tell that this is a Level 2 Predictor, you still have to use your contextual knowledge about the variable to understand where exactly you should place it inside your equation.
Adding Level 2 Predictor to Slope
Adding the Level 2 Predictor to the Random Slope is similar conceptually to adding it to the intercept - but slightly more exciting because you are implicitly creating a cross-level interaction effect. (okay, technically you already did by adding a random slope term WITHOUT any predictors - but still).
Because consider an empty level 2 model first - by allowing the slopes to vary by group, you technically created an “interaction term” because the level of the group interacts with the level 1 predictor to influence the eventual DV (though we don’t normally call this an interaction - because it’s just a random error term that is doing the “interaction”).
But with a level 2 predictor added to the slope term - we are explicitly saying that “there is an interaction between this level 2 predictor and the level 1 predictor to influence the effect of the level 1 predictor on the DV”!.
By substituting some numbers & fitting the model, let’s use some numbers to make it concrete.
If this were the R Output:
The equation would be written as:
We can see that the higher the school funding, the better the impact of hours studied on exam performance (because the coefficient for school funding is positive). Why is that? Maybe higher school funding contributes to a more conducive environment, which makes every hour studied more effective? You won’t know for sure - but this does seem plausible. Perhaps? This is how your MLM model can then lead to spin off research questions - whereby you conduct new research to further examine your discovered model structure!
Side note: this is also why there’s there’s a standing joke in the research community, whereby doing research just calls for more research lol (image credits to https://makeameme.org/meme/this-calls-for-5a9808)
Adding Both Level 2 Predictor to Both Intercept & Slope
Ya, it’s the same. Just combine the previous 2 sections and you’ll get this. This post is getting too long LOL.
Conclusion
In this post, we’ve learnt how to add Level 2 predictors - which truly spiced up our MLM models a bit! While useful - we soon start to face a problem - where we start to add every single damn thing on earth. This is not a good thing - because the overall model metrics will easily be dragged down by not as important predictors, rendering the whole model less useful.
To avoid this: there’s actually a sequence in which practitioners usually build their models - doing model comparison along the way to ensure that the principle of parsimony still applies. In the next post, we will look into this process to ensure that our model stays clean & efficient.
Stay tuned!

