In many sources of data, relevant information is conveyed by free text: this is the case for instance when analyzing the contents of patient records, scientific publications, social media, etc. Because of the non-formal nature of human language, contrary for instance to programming languages, computer-based extraction of structured information from natural language text is challenged by the high variation in expression and the importance of context for correct interpretation. Natural Language Processing aims to design methods that address these challenges, using human knowledge or data-driven methods. This course aims to bring participants to the level where they can independently perform text classification and extract data from text for further data processing and analysis.

The course provides an introduction to Natural Language Processing, including how to handle language units such as words, phrases, sentences, and additional information such as part-of-speech and syntactic structure. The most common applications of supervised machine learning to text analytics will be introduced, such as text classification, sequence labelling for information extraction, focusing on entity recognition and classification, as well as the creation and use of word embeddings and neural classifiers. The course will take biomedical text as illustration, supported by a short introduction to the representation and processing of biomedical terminology.

Content structure:

  • Introduction to Natural Language Processing
  • Basic Natural Language Processing tools
  • Machine learning for text classification
  • Sequence labelling for information extraction
  • Biomedical terminology for entity recognition
  • Word embeddings and neural classifiers for entity recognition

2021 - 2022
Pierre Zweigenbaum
