Questa è una vecchia versione del documento!
Text Analytics A.Y. 2020/21
Teacher
Andrea Esuli (andrea.esuli@isti.cnr.it)
Office hours: by appointment, send email.
Schedule
Objectives
The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.
Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
Mathematical background: Probability, Statistics and Algebra
Linguistic essentials: words, lemmas, morphology, PoS, syntax
Basic text processing: regular expression, tokenisation
Data collection: twitter
API, scraping
Basic modelling: collocations, language models
Introduction to Machine Learning: theory and practical tips
Libraries and tools: NLTK, Spacy, Keras, pytorch
Classification/Clustering
Sentiment Analysis/Opinion Mining
Information Extraction/Relation Extraction/Entity Linking
Transfer learning
Quantification
Exam
Exam will consist in a project to be agreed with the teacher and an oral exam.
The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range).
Oral exam will consist in the presentation and discussion of the project.
Lecture Notes
Date | Lecture | Notes |
2020/09/16 | Introduction to the course | 00_-_introduction_to_the_text_analytics_course.pdf 01_-_natural_language_and_text_analytics.pdf |
2020/09/17 | Introduction to probability | 02_-_introduction_to_probability.pdf |
2020/09/23 | Setup of Python environment | 03_-_introduction_to_python.pdf |
2020/09/24 | Introduction to Python | 03_1_introduction_to_python.zip |
2020/09/30 | Probabilistic Language Models | 04_-_probabilistic_language_models.pdf |
2020/10/01 | Probabilistic Language Models | 04_1_probabilisticlanguagemodel.zip |
2020/10/07 | Text Indexing, Regular expressions | 05_-_text_indexing.pdf 05.1_-_strings_regular_expressions_and_bs4.zip |
2020/10/08 | NLTK, Collocations | 05.2_-_nltk.zip 05.3_-_collocations.zip |
2020/10/14 | NLP tools, Spacy, Text indexing, preprocessing | 05.4_-_spacy_text_processing.ipynb.zip |
2020/10/15 | Vector space model, ML for text analytics | 06_-_machine_learning_for_text_analytics.pdf |
2020/10/21 | Scikit learn, pipeline | 06_1_classification_sklearn.zip |
2020/10/22 | Feature engineering | 06_2_classification_feature_engineering.zip |
2020/10/28 | Experimental protocols, optimization | 07_-_experiments.pdf 07_1_optimization_sklearn.zip |
2020/10/29 | Sequence labeling, information extraction | 08_-_information_extraction.pdf |
2020/11/04 | Inception, spacy | 08_1_spacy_ner_train.zip |
2020/11/05 | Data collection | 09_-_data_collection.pdf 09_1_scraping.zip 09_2_data_from_twitter.zip |
2020/11/11 | Introduction to neural networks | 10_-_a_primer_on_neural_networks.pdf 10.1_-_example_of_backpropagation.pdf |
2020/11/12 | From SVM to NN, deep learning | 10_2_svm_to_nn.zip |
2020/11/18 | Convolutional and Recurrent networks, text generation | 10_3_classification_cnnnet.zip 10_4_classification_lstmnet.zip 10_5_textgeneration.zip |
2020/11/19 | Word embeddings, neural language models | 11_-_neural_language_models.pdf 11_1_wordembeddings.zip |
2020/11/25 | Document embeddings, the Transformer | 11_2_documentembeddings.zip |
2020/11/26 | BERT fine-tuning | 11_3_bert_finetune_binary.zip 11_4_bert_finetune_multiclass.zip 11_5_simpletransformers_finetune_binary.zip 11_6_simpletransformer_generation_and_representation.zip |
2020/12/2 | Parsing | |
2020/12/3 | Parsing | |
2020/12/9 | Introduction to Sentiment Analysis, Sentiment Lexicons | |
2020/12/10 | Sentiment Classification | |
Textbooks
Previous editions