Strumenti Utente

Strumenti Sito


mds:txa:start

Questa è una vecchia versione del documento!


Text Analytics (635AA) A.Y. 2021/22

Teacher

Andrea Esuli (andrea.esuli@isti.cnr.it)

Office hours: by appointment, send email.

Schedule

Day Hour Room
Monday 9-11 Fib C - Teams
Friday 11-13 Fib M1 - Teams

Objectives

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.

  1. Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
  2. Mathematical background: Probability, Statistics and Algebra
  3. Linguistic essentials: words, lemmas, morphology, PoS, syntax
  4. Basic text processing: regular expression, tokenisation
  5. Data collection: twitter API, scraping
  6. Basic modelling: collocations, language models
  7. Introduction to Machine Learning: theory and practical tips
  8. Libraries and tools: NLTK, Spacy, Keras, pytorch
  9. Classification/Clustering
  10. Sentiment Analysis/Opinion Mining
  11. Information Extraction/Relation Extraction/Entity Linking
  12. Transfer learning
  13. Quantification

Exam

Students MUST contact the teacher at least one month before the date set for the exam session, so as to agree on the contents of the project and get a go ahead.

The date set for the exam session (Check here) is the deadline for submitting the completed project (report and code).

Exam will consist in a project to be agreed with the teacher and an oral exam. The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range). Oral exam will consist in the presentation and discussion of the project.

The purpose of the project is to let you have some hands on experience on applying the concepts and methods seen during the course to practical text analytics problems.

Projects may be based on challenges proposed in either research forums (Semeval, Evalita) or other platforms (Kaggle). Students are also invited to propose a project on problem based on other sources (e.g., recent papers on ArXiv CL or AI), or their own interests.

Students may work solo or in groups up to three persons.

Lecture Notes

Date Lecture Notes
2021/09/13 Introduction to the course, NLP & Text Analytics 00_-_introduction_to_the_text_analytics_course.pdf, 01_-_natural_language_and_text_analytics.pdf
2021/09/17 Introduction to probability 02_-_introduction_to_probability.pdf
2021/09/20 canceled
2021/09/24 Introduction to python 1/2 03_-_introduction_to_python.pdf 03_1_introduction_to_python.zip
2021/09/27 Introduction to python 2/2
2021/10/01 Probabilistic Language Models 1/2 04_-_probabilistic_language_models.pdf 04_1_probabilisticlanguagemodel.zip
2021/10/04 Probabilistic Language Models 2/2
2021/10/08 Text Indexing: Regular expressions 05_-_text_indexing.pdf 05_1_strings_regular_expressions_and_bs4.zip
2021/10/11 Text Indexing: NLTK, Collocations 05_2_nltk.zip 05_3_collocations.zip
2021/10/15 Text Indexing: Spacy, Feature selection, Pipeline 05_4_spacy_text_processing.zip
2021/10/18 Text Indexing: Notebook. Introduction to Machine Learning 05_5_text_indexing_sklearn.zip 06_-_machine_learning_for_text_analytics.pdf
2021/10/22 Machine Learning for TA: Paradigms and models 06_-_machine_learning_for_text_analytics.pdf
2021/10/25 Machine Learning for TA: the complete pipeline 06_1_classification_sklearn.zip
2021/10/29 Machine Learning for TA: Feature engineering, Topic Modeling 06_2_classification_feature_engineering.zip 06.3_-_topic_modeling.pdf 06.4_-_topic_modeling.ipynb.zip
2022/11/05 Experimental protocols and optimization 07_-_experiments.pdf 07_1_optimization_sklearn.zip
2022/11/08 Information Extraction, Entity Annotation 08_-_information_extraction.pdf 08_1_spacy_ner_train.zip
2022/11/12 Data collection 09_-_data_collection.pdf 09_1_scraping.zip 09_2_data_from_twitter.zip
2022/11/15 Introduction to Neural Networks 10_-_a_primer_on_neural_networks.pdf 10.1_-_example_of_backpropagation.pdf 10.2_-_svm_to_nn.ipynb.zip
2022/11/19 Text classification with Neural Networks 10.3_-_classification_-_cnnnet.ipynb.zip 10.4_-_classification_-_lstmnet.ipynb.zip 10_5_textgeneration.zip
2022/11/22 Neural Language Models, Word2Vec 11_-_neural_language_models.pdf 11_1_wordembeddings.zip
2022/11/26 Doc2Vec, Transformer, BERT 11_2_documentembeddings.zip 11_3_bert_finetune_binary.zip 11_4_bert_finetune_multiclass.zip
2022/11/29 SimpleTransformers 11_5_simpletransformers_finetune_binary.zip 11_6_simpletransformer_generation_and_representation.zip
2022/12/03 Sentiment Analysis 12_-_sentiment_analysis.pdf 12_1_vader.zip

Textbooks

  1. D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
  2. B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.
  3. S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.

Previous editions

mds/txa/start.1638523400.txt.gz · Ultima modifica: 03/12/2021 alle 09:23 (2 anni fa) da Andrea Esuli