The handling of data is a recurring task for data analysts. Reading in experimental data, checking its properties, and creating visualisations may become tedious tasks. Hence, increasing the efficiency in this process is beneficial for many professionals handling data. Spreadsheet-based software lacks the ability to properly support this process, due to the lack of automation and repeatability. The usage of a high-level scripting language such as Python is ideal for these tasks.
This course trains participants to use Python effectively to do these tasks. The course focuses on data manipulation and cleaning of tabular data, explorative analysis and visualisation using some important packages such as Pandas, Numpy, Matplotlib and Seaborn.
This course is part of a larger course series in Data Analysis consisting of 19 individual modules. Find more information and enroll for this module via www.ipvw-ices.ugent.be
After setting up the programming environment with the required packages using the conda package manager and an introduction of the Jupyter notebook environment, the data analysis package Pandas and the plotting packages Matplotlib and Seaborn are introduced. Advanced usage of Pandas for different data cleaning and manipulation tasks is taught and the acquired skills will immediately be brought into practice to handle real-world data sets. Applications include time series handling, categorical data, merging data, geospatial data,...
The course closes with a discussion on the scientific Python ecosystem and the visualisation landscape learning participants to create interactive charts.
The course does not cover statistics, data mining, machine learning, or predictive modelling. It aims to provide participants the means to effectively tackle commonly encountered data handling tasks in order to increase the overall efficiency. These skills are both useful for data cleaning as well as feature engineering.
All sessions are hands-on in Jupyter notebooks.