Strumenti Utente

Strumenti Sito


mds:txa:start

Questa è una vecchia versione del documento!


Text Analytics (635AA) A.Y. 2022/23

Teacher

Lucia Passaro (lucia.passaro [at] unipi [dot] it)

Office hours: Monday 16-18 via Teams

Schedule

Day Hour Room
Monday 9-11 Fib M1
Friday 11-13 Fib M1

Team of the class

Objectives

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The main objectives of the course are:

  1. Learning essential techniques, algorithms, and models used in natural language processing.
  2. Understanding of the architectures of typical text analytics applications and of libraries for building them.
  3. Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.

Background

  • Background: Natural Language Processing, Information Retrieval and Machine Learning
  • Mathematical background: Probability, Statistics and Algebra
  • Linguistic essentials: words, lemmas, morphology, Part of Speech (PoS), syntax
  • Basic text processing: regular expression, tokenisation
  • Data collection: scraping
  • Basic modelling: collocations, language models
  • Introduction to Machine Learning: theory and practical tips
  • Libraries and tools: NLTK, Spacy, Keras, pytorch
  • Classification/Clustering
  • Sentiment Analysis/Opinion Mining
  • Information Extraction/Relation Extraction/Entity Linking
  • Transfer learning
  • Quantification

Lecture Notes

Date Lecture Slides Material / Reference
2022/09/16 Introduction to the course, NLP & Text Analytics. 1 - Introduction to the Text Analytics courseJ. Eisenstein. Introduction to Natural Language Processing. MIT Press. Chp. 1.
2022/09/19 Reminds on Probability. Language and Probability. 2 - Reminds on Probability.pdf
2022/09/23 Introduction to Python. 3 - Introduction to Python.pdfIntroduction to Python - Notebook.
2022/09/30 Introduction to Python (continued). Project Presentation and Important Dates. Project and Dates
2022/10/03 Probabilistic Language Models. 5 - Probabilistic Language modelsD. Jurafsky, J.H. Martin. Chp. 3. Probabilistic Language Models - Notebook.
2022/10/07 Text Indexding: Strings, Regular Expressions and BS4. 6 - Text Indexing-1D. Jurafsky, J.H. Martin. Chp. 2. Strings, Regular Expressions and BS4 - Notebook.
2022/10/10 Text Indexding: Linguistic annotation. NLTK. 6 - Text Indexing-2 Linguistic annotation with NLTK - Notebook.
2022/10/14 Text Indexding: Collocations with Gensim. stanza. spacy. Feature selection. 6 - Text Indexing-3 L6.3.4 - collocations - stanza - spacy - Notebooks.
2022/10/17 Text Indexding: Vector space models. 6 - Text Indexing-4D. Jurafsky, J.H. Martin. Chp. 6. L6.5 - Vector space model - toy example - Notebook.

Exam

Attending students

The exam for attending students will consist of the development of a project to be agreed upon with the teacher and an oral exam. The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range). The oral exam will consist of the presentation and discussion of the project. Projects may be based on challenges proposed in either research forums (Semeval, Evalita) or other platforms (Kaggle). Students are also invited to propose a project based on other sources (e.g., recent papers on ArXiv CL or AI), or their own interests. Students may work in 3-5 people groups.

Non-Attending students

The exam for non attending students will consist in a written exam with open question and exercises, and an oral discussion on the topics of the course.

Textbooks

It is recommended to read selected chapters from:

  1. D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
  2. S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.

Further bibliography will be indicated as a material for the single lessons.

Previous editions

mds/txa/start.1665996179.txt.gz · Ultima modifica: 17/10/2022 alle 08:42 (18 mesi fa) da Lucia Passaro