Identifying Latent Data Structures: Structural Equation Modelling II
AI en Data Science
AI en Data Science
Hierarchically clustered (multilevel or nested) data are common in most scientific fields, including the medical, biological and social sciences. For example, individuals may be nested within geographical areas, institutions, or companies, the canonical example being students nested within schools. Multilevel data also arise in longitudinal studies where one or several outcomes are measured on several occasions. Another feature of multilevel data is that variables can be measured at any level. For example, we may have collected measures of student outcomes and student characteristics, but we may also have collected variables at the school level.
This course is part of a larger course series in Data Analysis consisting of 19 individual modules. Find more information and enroll for this module via www.ipvw-ices.ugent.be
This course starts with a refresher of multilevel modeling (MLM). We will discuss key concepts of MLM, introduce the linear mixed model, and provide several examples of univariate multilevel regression analysis. All analyses will be done in R, using a variety of packages (nlme, lme4, lavaan). Next, we will discuss the relationship between classic (single-level) regression, multilevel regression, and structural equation modeling (SEM). We will do this both from a theoretical point of view as well as from a software point of view. We will show how and under which conditions (classic, non-multilevel) SEM software can produce identical results as dedicated multilevel (or mixed modeling) software.
On the second day, we will introduce the multilevel SEM framework. We will start from a regression perspective, and gradually proceed from a simple regression analysis, to a two-level regression analysis, towards more complicated (regression) models, exploiting the full power of the multilevel SEM framework. Special attention will be given to multilevel mediation models, and the difference between the latent and manifest covariate approach to represent observed exogenous covariates at the between level. Next, we will take a latent-variable (CFA) perspective, and discuss various examples of multilevel CFA, and eventually multilevel SEM involving latent variables and regressions among latent variables. Here, special attention will be given to the interpretation of the latent variables at both the within and between level, together with a typology of possible approaches. Along the way, we will discuss many practical issues including the role of centering, the treatment of missing and/or non-normal data, and how to deal with categorical data. Finally, we will discuss some alternative approaches to handle clustering in the data in a SEM framework, including the design-based (survey) approach, and the 'wide format' approach.
The main software used in this course is the open-source R package `lavaan' (see http://lavaan.org).