High-throughput sequencing technologies allow easy characterisation of the microbiome, but the data analysis faces many particular issues and difficulties. The data analysis starts with the processing of the raw read counts to turn them into an OTU table. In this process, quality control, filtering and clustering into OTUs are essential steps. Once the OTU count table is ready, the choice of data analysis method depends on the research objectives, but very often a first visual data exploration is performed.
Ordination methods, which often originate from ecology, are well suited for this purpose, but new methods tailored to microbiome data behave better for the overdispersed, zero inflated sequencing data. Formal statistical data analysis methods are required for identifying species that are differentially abundant between several conditions; again there is a need for special methods that can deal with overdispersion, zero-inflation, library size variability and potentially with the compositional nature of microbiome data.
The data analysis becomes even more elaborated for longitudinal data when studying the evolution of the microbiome over time. These analyses may focus on either individual taxa or on diversity of the microbial community (richness, alpha and beta diversity, ...). We focus on 16S rRNA amplicon sequencing data.
This course is part of a larger course series in Data Analysis consisting of 19 individual modules. Find more information and enroll for this module via www.ipvw-ices.ugent.be
The course starts with a brief overview of the processing of raw reads data into an OTU table (including filtering, trimming and clustering into OTUs). We continue with summarizing, exploring and plotting the high dimensional data with ordination and clustering methods. Next we focus on the estimation of diversity (including eveness, richness, beta diversity) and relative abundances, while spending attention on normalization issues. We will discuss several methods for testing for differential abundance and diversity, including methods for longitudinal data analysis.
During the practical exercises we will use R and several packages that will be provided later on.