Strumenti Utente

Strumenti Sito


mds:txa:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
mds:txa:start [07/09/2020 alle 10:15 (4 anni fa)]
Andrea Esuli [Schedule] New Teams link
mds:txa:start [15/01/2024 alle 10:31 (2 mesi fa)] (versione attuale)
Laura Pollacci
Linea 1: Linea 1:
-====== Text Analytics A.Y2020/21 ======+<html> 
 +<!-- Google Analytics --> 
 +<script type="text/javascript" charset="utf-8"> 
 +(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ 
 +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), 
 +m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) 
 +})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
  
 +ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true});
 +ga('personalTracker.require', 'linker');
 +ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it', 'luciacpassaro.github.io'] );    
 +ga('personalTracker.require', 'displayfeatures');
 +ga('personalTracker.send', 'pageview', 'courses/txa/');
 +setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000); 
 +</script>
 +<!-- End Google Analytics -->
 +<!-- Global site tag (gtag.js) - Google Analytics -->
 +<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script>
 +<script>
 +  window.dataLayer = window.dataLayer || [];
 +  function gtag(){dataLayer.push(arguments);}
 +  gtag('js', new Date());
  
-==== Teacher ====+  gtag('config', 'G-LPWY0VLB5W'); 
 +</script> 
 +<!-- Capture clicks --> 
 +<script> 
 +jQuery(document).ready(function(){ 
 +  jQuery('a[href$=".pdf"]').click(function() { 
 +    var fname this.href.split('/').pop(); 
 +    ga('personalTracker.send', 'event',  'TXA', 'PDFs', fname); 
 +  }); 
 +  jQuery('a[href$=".r"]').click(function() { 
 +    var fname = this.href.split('/').pop(); 
 +    ga('personalTracker.send', 'event',  'TXA', 'Rs', fname); 
 +  }); 
 +  jQuery('a[href$=".zip"]').click(function() { 
 +    var fname = this.href.split('/').pop(); 
 +    ga('personalTracker.send', 'event',  'TXA', 'ZIPs', fname); 
 +  }); 
 +}); 
 +</script> 
 +</html> 
 +====== Text Analytics (635AA) A.Y. 2023/24 ======
  
-[[http://www.esuli.it/|Andrea Esuli]] (andrea.esuli@isti.cnr.it) 
  
-Office hours: by appointment, send email.+==== Teacher ====
  
 +[[https://laurapollacci.github.io/txa.html|Laura Pollacci]] (laura.pollacci [at] di [dot] unipi [dot] it)
  
-==== Schedule  ====+Office hours: 
  
-Lectures will be given using Microsoft Teams. 
-[[https://teams.microsoft.com/l/team/19%3ad515c158b0c64bc8b4efa3b21aab6fa7%40thread.tacv2/conversations?groupId=205782a2-623f-4f50-a391-9d4f22d4d604&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Join the Text Analytics Team here.]] 
  
 +==== Schedule ====
  
-^ Day ^ Hour ^ Room ^ +^ Day ^ Hour ^ Room ^  
-TBA TBA | [[https://teams.microsoft.com/l/team/19%3ad515c158b0c64bc8b4efa3b21aab6fa7%40thread.tacv2/conversations?groupId=205782a2-623f-4f50-a391-9d4f22d4d604&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Text Analytics Team]] +Thursday 16-18 Fib C1 
-TBA TBA | [[https://teams.microsoft.com/l/team/19%3ad515c158b0c64bc8b4efa3b21aab6fa7%40thread.tacv2/conversations?groupId=205782a2-623f-4f50-a391-9d4f22d4d604&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Text Analytics Team]] |+Friday11-13 Fib M1 |
  
 +
 +[[https://teams.microsoft.com/l/channel/19%3aiBnp7L1JmbHPmkQ3NcO3NrxPDZB-RhMvlQzMRdCrWFM1%40thread.tacv2/Generale?groupId=9e5370ba-93b4-41d0-b0b1-b7464ab92f11&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Team of the class]]
  
 ==== Objectives ==== ==== Objectives ====
-The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needsto identify the analytic task/process that best models the business problemto select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extractionsentiment analysis (what is the nature of commentary on an issue)spam and fake posts detectionquantification problemssummarization, etc.+The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form.  
 +The main objectives of the course are: 
 +  - Learning essential techniques, algorithms, and models used in natural language processing. 
 +  - Understanding of the architectures of typical text analytics applications and of libraries for building them.  
 +  - Expertise in designimplementation, and evaluation of applications that exploit analysisinterpretationand transformation of texts.
  
-  - Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning 
-  - Mathematical background: Probability, Statistics and Algebra 
-  - Linguistic essentials: words, lemmas, morphology, PoS, syntax 
-  - Basic text processing: regular expression, tokenisation 
-  - Data collection: twitter API, scraping 
-  - Basic modelling: collocations, language models 
-  - Introduction to Machine Learning: theory and practical tips 
-  - Libraries and tools: NLTK, Spacy, Keras, pytorch 
-  - Classification/Clustering 
-  - Sentiment Analysis/Opinion Mining 
-  - Information Extraction/Relation Extraction/Entity Linking 
-  - Transfer learning 
-  - Quantification 
  
 +==== Background ====
 +
 +  * Background: Natural Language Processing, Information Retrieval and Machine Learning
 +  * Mathematical background: Probability, Statistics and Algebra
 +  * Linguistic essentials: words, lemmas, morphology, Part of Speech (PoS), syntax
 +  * Basic text processing: regular expression, tokenisation
 +  * Data collection: scraping
 +  * Basic modelling: collocations, language models
 +  * Introduction to Machine Learning: theory and practical tips
 +  * Libraries and tools: NLTK, Spacy, Keras, pytorch
 +  * Classification/Clustering
 +  * Sentiment Analysis/Opinion Mining
 +  * Information Extraction/Relation Extraction/Entity Linking
 +  * Transfer learning
 +  * Quantification
 +
 +
 +==== Lecture Notes ====
 +
 +^ Date ^ Lecture ^ Slides ^ Material / Reference ^
 +| 2023/09/21 | Introduction to the course, NLP & Text Analytics. | [[https://drive.google.com/file/d/11BPheGG5YiZcNeFFQirMMSIrObEsbayf/view?usp=drive_link| 1 - Introduction to the Text Analytics course]]|J. Eisenstein. Introduction to Natural Language Processing. MIT Press.[[https://drive.google.com/file/d/1v455MySmNo5qVSRle676L0pjc4wktVNh/view?usp=drive_link| Chp. 1]].|
 +| 2023/09/22 | Reminds on probability. | [[https://drive.google.com/file/d/1fH8sjhnh9dlPcPMwpAYSbsP0tbCaamSV/view?usp=sharing| 2 - Reminds on probability]]|
 +| 2023/09/28 | Introduction to Python. | [[https://drive.google.com/file/d/1fOn73KfDqlaU-0dgXs4-qkIbm8ZCg8Px/view?usp=sharing| 3 - Introduction to Python]]| [[https://drive.google.com/file/d/16BIcJuP4vB5b5oUmV03R7fX_-wRaFI8Y/view?usp=sharing | L3 - Introduction_to_Python.ipynb]] |
 +| 2023/09/29 | Introduction to Python - part 2. Project and Dates | [[https://drive.google.com/file/d/11E-3DWARykKVZDuB1vuDoXySAPPWYFoq/view?usp=sharing| 4 - Project and Dates]]| 
 +| 2023/10/05 | Probabilistic language models| [[https://drive.google.com/file/d/1Nj6FgcBSK9otmJwjDj2bxWWulCzPlHZb/view?usp=drive_link|5 - Probabilistic language models]]| D. Jurafsky, J.H. Martin. [[ https://drive.google.com/file/d/1K3B0s0-T3NnpfgmR6NGsZdwWqGoa0S5Q/view?usp=drive_link|Ch3]] [[https://drive.google.com/file/d/13r6wn4jlrOncZ0zUc5efmu2RgqDGUz2g/view?usp=drive_link|L5 Probabilistic Language Model.ipynb]] |
 +| 2023/10/06| Text Indexding: Strings, Regular Expressions and BS4. | [[https://drive.google.com/file/d/1Zp6vqh5Wj9YzwtpcgMSxm7NUZ_oN8SW7/view?usp=sharing| 6 - Text indexing 1]] | D. Jurafsky, J.H. Martin. [[https://drive.google.com/file/d/1SH4Em84AEHNzc6OzrhjvW_ggo_0nJiOx/view?usp=sharing|Ch2]]  [[https://drive.google.com/file/d/13miwALDtad7ERoObFnlPjeUYBaAfwZGF/view?usp=sharing|L6.1 - Strings Regular expressions and BS4.ipynb]]|
 +| 2023/10/12| Linguistic annotation. NLTK. | [[https://drive.google.com/file/d/1t2WNuMZ1PAE4i_GgPbd-DCJWx8gWnhQf/view?usp=sharing| 6 - Text Indexing 2]]|[[https://drive.google.com/file/d/14ahCe4h45MHn_yMhUbOwO7o8Ms9jl-sD/view?usp=sharing|L6.2 - Linguistic annotation with NLTK.ipynb]] |
 +|2023/10/13| //Lesson canceled due to UNIPI orientation days.//|
 +|2023/10/19| Feature Selection| [[https://drive.google.com/file/d/1iWDaF7BXykUrRwOrIfc8ERlOewXaOQm7/view?usp=sharing|6 - Text Indexing 3]] | [[https://drive.google.com/file/d/1mD4v_ts0A1CHcTrU9nIYz-Nugvok1jks/view?usp=sharing |L6.3 - Gensim collocations - Stanza - Spacy (Notebooks)]] |
 +|2023/10/20| Vector space models | [[https://drive.google.com/file/d/1JIKfDSAZh3raAfRB_tTGFNqjxgfukKGy/view?usp=sharing|6 - Text Indexing 4]] | D. Jurafsky, J.H. Martin. [[https://drive.google.com/file/d/1Hj3n4qCuZpTIrS_M352QAyH70xC6Fxrg/view?usp=share_link|Chp. 6.]] [[https://drive.google.com/file/d/1RUJYFizlp1ldl2DbmZDDXvw8WhDS6E4k/view?usp=sharing|L6.4 - Vector space model - toy example]]|
 +|2023/10/26| //Lesson canceled//|
 +|2023/10/27| //Lesson canceled//|
 +|2023/11/02| Machine Learning for Text Analytics. | [[ https://drive.google.com/file/d/1zc925Q0yzdmh2nvB0McdQBOeVgJ1aD3R/view?usp=sharing| 10 - Machine Learning for Text Analytics]] - corrected|
 +|2023/11/03| Machine Learning for Text Analytics: Design Experimental Protocols. Student presentations: How to. | [[https://drive.google.com/file/d/1gaaWVORZnp7gJ6ZGloKlyQTSw07SZ8in/view?usp=sharing| 11 - Design Experimental Protocols]]. [[https://drive.google.com/file/d/1b5I7NhRXuzjk93Pea6pyxzzCw31OhD8Z/view?usp=sharing| 11.1 - Student presentations: How to]] | [[https://drive.google.com/file/d/1X0BYS66px-aTYoDVZzx2sTixmaX4agrP/view?usp=sharing | L.11 - Classification with SkLearn]] |
 +|2023/11/09| Student project presentations: proposal, brainstorming, discussion. |
 +|2023/11/10| Student project presentations: proposal, brainstorming, discussion. |
 +|2023/11/16| Topic Modeling | [[https://drive.google.com/file/d/1M7EMWkYfqDWZjf6W22yIVJLK0QbJTh_v/view?usp=sharing|12 - Topic Modeling]] | Zhai and Massung (2016) Text Data Management and Analysis. [[https://drive.google.com/file/d/1Cwzon44c0-7b_4bbHyUO6ArolacQFY_5/view?usp=sharing|Chp 17]]. [[https://drive.google.com/file/d/1-Iyz860uAII3pplAk_VMqi5gK5N_S4pD/view?usp=sharing |L.12 -Topic Modeling - Notebook.]]. [[https://drive.google.com/file/d/1H60PV4Wt5gRs_B6MB4J2YJ-gsiySf6lv/view?usp=sharing|L.12.1 - Topic Modeling pyLDAvis - Notebook]]|
 +|2023/11/17| A primer on Neural Networks |[[https://drive.google.com/file/d/1MS7upbsydqkPMIRfYv9pKHXz2mfGb1ST/view?usp=sharing |13 - A primer on Neural Networks]] |
 +|2023/11/23|Neural Networks | [[https://drive.google.com/file/d/13tQ1m-ogPR3R_PSAWLDomvPmsBal8E55/view?usp=sharing | 14 - Neural Networks]] | [[https://drive.google.com/file/d/1ZP9WN4OTSw2VoO7jWIpJlWBh_oGFwxjN/view?usp=sharing| From SVM to NN, Classification with Keras - Notebooks.]] |
 +|2023/11/24| Neural Language Models | [[https://drive.google.com/file/d/1vezeT7l6Wd9D0otEYXSAjg0ih1XoggmW/view?usp=sharing| 15 - Neural Language Models]]| D. Jurafsky, J.H. Martin. Chps. [[https://drive.google.com/file/d/10SjSlr4bk6jBWTEkA4vsTUomB8y4iJ-C/view?usp=sharing|7]] [[https://drive.google.com/file/d/1MkfAsC-rY6HuWM6ZTS1TB8LoLxN-sPPq/view?usp=sharing|9]] [[https://drive.google.com/file/d/1P3j4qTH6IH_R42huYLL83cvPd1Ci2Ar1/view?usp=sharing|11]] |
 +|2023/11/30| Student project presentations: ongoing experiments. Neural Language Models Practice | [[https://drive.google.com/file/d/1Dc0l2zQfX9poOymZKrhYiHMUiv9TT7m_/view?usp=sharing|16 - Neural Language Models Word2Vec]]| [[https://drive.google.com/file/d/14BIROGvYzNjbmmVzZqeiY-tLkhRAR8tW/view?usp=sharing |Word2vec - Notebook.]]|
 +|2023/12/01| Student project presentations: ongoing experiments. Neural Language Models Practice | [[https://drive.google.com/file/d/1R4Yfr5v8ygsK61dV-h-mZhU_iY0OuZmK/view?usp=sharing|17 - Neural Language Models Doc2Vec]]|[[https://drive.google.com/file/d/1JaGXJE-rF3Yvmtd1Je8NCdDapLiL17Pg/view?usp=sharing|Doc2Vec - Notebook]]|
 +|2023/12/07| Neural Language Models - part 2 |[[https://drive.google.com/file/d/1QxmavpSIjX1x46UkNR1RflY64Sbc3vLs/view?usp=sharing|Neural Language Models - part 2]]|
 +|2023/12/11| BERT. Project Submission |[[https://drive.google.com/file/d/1JX6HCObZYtLUApYJDl1ftDTl5nKn-aHi/view?usp=sharing| 19 - Bert]]. [[https://drive.google.com/file/d/1GOwUTqWnkONM-SI8D0JANGKuqX0pBp35/view?usp=sharing|Project Submission]]| [[ https://drive.google.com/file/d/1JX6HCObZYtLUApYJDl1ftDTl5nKn-aHi/view?usp=sharing|Bert - Notebooks]] |
 +|2023/12/14| Advanced Topics | [[ https://drive.google.com/file/d/14zg2w7-s_cpIJQBwGfXoj_yfjZNZLYQh/view?usp=sharing |20 - Advanced Topics]]| Recommended chapters: D. Jurafsky, J.H. Martin. [[https://drive.google.com/file/d/1ik_BGxKUNAi5GwQZQv4vI9Gqvv4wkWK9/view?usp=sharing|20]];[[https://drive.google.com/file/d/1VJbNelq63EagAxdgleJu2isJVBBb_vkl/view?usp=sharing|24]].| 
  
 ==== Exam ==== ==== Exam ====
  
-Exam will consist in a project to be agreed with the teacher and an oral exam. +** Attending students **
-The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range). +
-Oral exam will consist in the presentation and discussion of the project.+
  
 +The exam for attending students will consist of the development of a project to be agreed upon with the teacher and an oral exam. The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range). The oral exam will consist of the presentation and discussion of the project.
 +Projects may be based on challenges proposed in either research forums ([[https://alt.qcri.org/semeval2020/|Semeval]], [[http://www.evalita.it/|Evalita]]) or other platforms ([[https://kaggle.com|Kaggle]]). Students are also invited to propose a project based on other sources (e.g., recent papers on ArXiv [[https://arxiv.org/list/cs.CL/new|CL]] or [[https://arxiv.org/list/cs.AI/new|AI]]), or their own interests. Students may work in 3-5 people groups.
  
-==== Lecture Notes ==== 
  
-^ Date ^ Lecture ^ Notes ^ +** Non-Attending students **  
-| yy/mm/dd| some topic| link to slides |+ 
 +The exam for non attending students will consist in a written exam with open question and exercises, and an oral discussion on the topics of the course. 
 + 
 +Written test [[https://drive.google.com/file/d/1Q-NVz_x-UjllTG-CPAKGV4aKmK4Hz5af/view?usp=share_link|example]]. 
  
  
 ==== Textbooks ==== ==== Textbooks ====
 +It is recommended to read selected chapters from:
 +
  
   - D. Jurafsky, J.H. Martin, [[https://web.stanford.edu/~jurafsky/slp3/|Speech and Language Processing]]. 3nd edition, Prentice-Hall, 2018.   - D. Jurafsky, J.H. Martin, [[https://web.stanford.edu/~jurafsky/slp3/|Speech and Language Processing]]. 3nd edition, Prentice-Hall, 2018.
-  - B. Liu, [[https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.html|Sentiment Analysis and Opinion Mining]]. Morgan & Claypool Publishers, 2012. 
   - S. Bird, E. Klein, E. Loper. [[https://www.nltk.org/book/|Natural Language Processing with Python]].   - S. Bird, E. Klein, E. Loper. [[https://www.nltk.org/book/|Natural Language Processing with Python]].
  
 +Further bibliography will be indicated as a material for the single lessons.
 ==== Previous editions ==== ==== Previous editions ====
  
 +  * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1671529070|2022-2023]]
 +  * [[http://didawiki.cli.di.unipi.it/doku.php/mds/txa/start?rev=1649067582|2021-2022]]
 +  * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1612257498|2020-2021]]
   * [[https://elearning.di.unipi.it/course/view.php?id=162|2019-2020]]   * [[https://elearning.di.unipi.it/course/view.php?id=162|2019-2020]]
   * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1551450538|2018-2019]]   * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1551450538|2018-2019]]
   * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1515682954|2017-2018]]   * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1515682954|2017-2018]]
  
mds/txa/start.1599473738.txt.gz · Ultima modifica: 07/09/2020 alle 10:15 (4 anni fa) da Andrea Esuli