Questa è una vecchia versione del documento!
Indice
Text Analytics A.Y. 2020/21
Teacher
Andrea Esuli (andrea.esuli@isti.cnr.it)
Office hours: by appointment, send email.
Schedule
Lectures will be given using Microsoft Teams. Join the Text Analytics Team here.
Lecture recording is available on Microsoft Teams for delayed viewing.
Day | Hour | Room |
---|---|---|
Wednesday | 9-11 | Text Analytics Team |
Thursday | 9-11 | Text Analytics Team |
Objectives
The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.
- Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
- Mathematical background: Probability, Statistics and Algebra
- Linguistic essentials: words, lemmas, morphology, PoS, syntax
- Basic text processing: regular expression, tokenisation
- Data collection: twitter API, scraping
- Basic modelling: collocations, language models
- Introduction to Machine Learning: theory and practical tips
- Libraries and tools: NLTK, Spacy, Keras, pytorch
- Classification/Clustering
- Sentiment Analysis/Opinion Mining
- Information Extraction/Relation Extraction/Entity Linking
- Transfer learning
- Quantification
Exam
Exam will consist in a project to be agreed with the teacher and an oral exam. The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range). Oral exam will consist in the presentation and discussion of the project.
Lecture Notes
Date | Lecture | Notes |
---|---|---|
2020/09/16 | Introduction to the course | 00_-_introduction_to_the_text_analytics_course.pdf 01_-_natural_language_and_text_analytics.pdf |
2020/09/17 | Introduction to probability | 02_-_introduction_to_probability.pdf |
2020/09/23 | Setup of Python environment | 03_-_introduction_to_python.pdf |
2020/09/24 | Introduction to Python | 03_1_introduction_to_python.zip |
2020/09/30 | Probabilistic Language Models | 04_-_probabilistic_language_models.pdf |
2020/10/01 | Probabilistic Language Models | 04_1_probabilisticlanguagemodel.zip |
2020/10/07 | Text Indexing, Regular expressions | 05_-_text_indexing.pdf 05.1_-_strings_regular_expressions_and_bs4.zip |
2020/10/08 | NLTK, Collocations | 05.2_-_nltk.zip 05.3_-_collocations.zip |
Textbooks
- D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
- B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.
- S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.