Identifying Latent Data Structures: Structural Equation Modelling II

Starts on 23.05.2022

AI and Data Science


More information
° DA2121-M7-II EN
Tags: Postacademische opleiding


Hierarchically clustered (multilevel or nested) data are common in most scientific fields, including the medical, biological and social sciences. For example, individuals may be nested within geographical areas, institutions, or companies, the canonical example being students nested within schools. Multilevel data also arise in longitudinal studies where one or several outcomes are measured on several occasions. Another feature of multilevel data is that variables can be measured at any level. For example, we may have collected measures of student outcomes and student characteristics, but we may also have collected variables at the school level.

This course is part of a larger course series in Data Analysis consisting of 19 individual modules. Find more information and enroll for this module via


This course starts with a refresher of multilevel modeling (MLM). We will discuss key concepts of MLM, introduce the linear mixed model, and provide several examples of univariate multilevel regression analysis. All analyses will be done in R, using a variety of packages (nlme, lme4, lavaan). Next, we will discuss the relationship between classic (single-level) regression, multilevel regression, and structural equation modeling (SEM). We will do this both from a theoretical point of view as well as from a software point of view. We will show how and under which conditions (classic, non-multilevel) SEM software can produce identical results as dedicated multilevel (or mixed modeling) software.

On the second day, we will introduce the multilevel SEM framework. We will start from a regression perspective, and gradually proceed from a simple regression analysis, to a two-level regression analysis, towards more complicated (regression) models, exploiting the full power of the multilevel SEM framework. Special attention will be given to multilevel mediation models, and the difference between the latent and manifest covariate approach to represent observed exogenous covariates at the between level. Next, we will take a latent-variable (CFA) perspective, and discuss various examples of multilevel CFA, and eventually multilevel SEM involving latent variables and regressions among latent variables. Here, special attention will be given to the interpretation of the latent variables at both the within and between level, together with a typology of possible approaches. Along the way, we will discuss many practical issues including the role of centering, the treatment of missing and/or non-normal data, and how to deal with categorical data. Finally, we will discuss some alternative approaches to handle clustering in the data in a SEM framework, including the design-based (survey) approach, and the 'wide format' approach.

The main software used in this course is the open-source R package `lavaan' (see

Course number:
Short- en long-term programmes
Area of interest:
AI and Data Science, Sciences
Academic year:
2021 - 2022
Starting date:
Yves Rosseel
Contact person:
More information

We use cookies to to give you the best possible user experience on our website. You can refuse the installation of cookies. By doing so some parts of this website will not work in an optimal way. Read more.

Your browser does not meet the minimum requirements to view this website. The browsers below are compatible. If you do not have one of these browsers, click on the icon to download the desired browser.