Many modern digital applications increasingly rely on machine learning as a means to derive predictive strength from high-dimensional data sets. Compared to traditional statistics, the absence of a focus on scientific hypotheses, and the need for easily leveraging detailed signals in the data require a different set of models, tools, and analytical reflexes.
This course is part of a larger course series in Data Analysis consisting of 19 individual modules. Find more information and enroll for this module via www.ipvw-ices.ugent.be
This course aims to bring participants to the level where they can independently tackle the analytical part of data mining projects. This means that the most common types of projects will be addressed - regression-type with continuous outcomes, classification with categorical outcomes, and clustering. For each of these, the practical use of a set of standard methods will be shown, like Random Forests, Gradient Boosting Machines, Support Vector Machines, k-Nearest-Neighbors, K-means,... Furthermore, throughout the course, concepts will be highlighted that are of concern in every statistical learning applications, like the curse of dimensionality, model capacity, overfitting and regularization, and practical strategies will be offered to deal with them, introducing techniques such as the Lasso and ridge regression, cross-validation, bagging and boosting. Instructions will also be given on a selection of specific techniques that are often of interest, such as modern visualization of high-dimensional data, model calibration, outlier detection using isolation forests, explanation of black-box models,... Finally, the last lecture will introduce the idea of deep learning as a powerful tool for data analysis, discussing when and how to practically use it, and when to shy away from it.