Modern high throughput technologies easily generate data on thousands of variables; e.g. health care data, genomics, chemometrics, environmental monitoring, web logs, movie ratings, …

Conventional statistical methods are no longer suited for effectively analysing such high-dimensional data.
Multivariate statistical methods may be used, but for often the dimensionality of the data set is much larger than the number of (biological) samples. Modern advances in statistical data analyses allow for the appropriate analysis of such data.

Methods for the analysis of high dimensional data rely heavily on multivariate statistical methods. Therefore a large part of the course content is devoted to multivariate methods, but with a focus on high dimensional settings and issues.

Multivariate statistical analysis covers many methods. In this course a selection of techniques is covered based on our experience that they are frequently used in industry and research institutes.

The course is taught using case studies with applications from different fields (analytical chemistry, ecology, biotechnology, genomics, …).

  1. Dimension reduction: Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Multidimensional Scaling (MDS) and biplots for dimension-reduced data visualisation
  2. Sparse SVD and sparse PCA
  3. Prediction with high dimensional predictors: principal component regression; ridge, lasso and elastic net penalised regression methods
  4. Classification (prediction of class membership): (penalised) logistic regression and linear discriminant analysis
  5. Evaluation of prediction models: sensitivity, specificity, ROC curves, mean squared error, cross validation
  6. Clustering
  7. Large scale hypotheses testing: FDR, FDR control methods, empirical Bayes (local) FDR control

