Strumenti Utente

Strumenti Sito


magistraleinformatica:dmi:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
Prossima revisione Entrambe le parti successive la revisione
magistraleinformatica:dmi:start [12/11/2021 alle 00:49 (2 anni fa)]
Anna Monreale [First Semester]
magistraleinformatica:dmi:start [20/12/2022 alle 13:46 (16 mesi fa)]
Anna Monreale [First Semester]
Linea 12: Linea 12:
      
 ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.require', 'displayfeatures');
-ga('personalTracker.send', 'pageview', 'ruggieri/teaching/dm/');+ga('personalTracker.send', 'pageview', 'courses/dminf/');
 setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000);  setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000); 
 </script> </script>
Linea 39: Linea 39:
   jQuery('a[href$=".pdf"]').click(function() {   jQuery('a[href$=".pdf"]').click(function() {
     var fname = this.href.split('/').pop();     var fname = this.href.split('/').pop();
-    ga('personalTracker.send', 'event',  'DM', 'PDFs', fname);+    ga('personalTracker.send', 'event',  'DMINF', 'PDFs', fname);
   });   });
   jQuery('a[href$=".r"]').click(function() {   jQuery('a[href$=".r"]').click(function() {
     var fname = this.href.split('/').pop();     var fname = this.href.split('/').pop();
-    ga('personalTracker.send', 'event',  'DM', 'Rs', fname);+    ga('personalTracker.send', 'event',  'DMINF', 'Rs', fname);
   });   });
   jQuery('a[href$=".zip"]').click(function() {   jQuery('a[href$=".zip"]').click(function() {
     var fname = this.href.split('/').pop();     var fname = this.href.split('/').pop();
-    ga('personalTracker.send', 'event',  'DM', 'ZIPs', fname);+    ga('personalTracker.send', 'event',  'DMINF', 'ZIPs', fname);
   });   });
   jQuery('a[href$=".mp4"]').click(function() {   jQuery('a[href$=".mp4"]').click(function() {
     var fname = this.href.split('/').pop();     var fname = this.href.split('/').pop();
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname);+    ga('personalTracker.send', 'event',  'DMINF', 'Videos', fname);
   });   });
   jQuery('a[href$=".flv"]').click(function() {   jQuery('a[href$=".flv"]').click(function() {
     var fname = this.href.split('/').pop();     var fname = this.href.split('/').pop();
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname);+    ga('personalTracker.send', 'event',  'DMINF', 'Videos', fname);
   });   });
 }); });
 </script> </script>
 </html> </html>
-====== Data Mining (309AA) - 9 CFU A.Y. 2021/2022 ======+====== Data Mining (309AA) - 9 CFU A.Y. 2022/2023 ======
  
 **Instructor:** **Instructor:**
Linea 70: Linea 70:
     * KDDLab, SNS, Pisa     * KDDLab, SNS, Pisa
     * [[francesca.naretto@sns.it]]       * [[francesca.naretto@sns.it]]  
 +  * * **Lorenzo Mannocci**
 +    * University of Pisa
 +    * [[lorenzo.mannocci@phd.unipi.it]]  
  
 ====== News ====== ====== News ======
-  * [28.10.2021] ** Lecture of Friday 29.10.2021 will be canceled **  +  * [28.10.2022] ** The lectures on 16 and 17 November will be suppressed. **  
-  * [23.09.2021 Pleasefill this document: [[https://docs.google.com/spreadsheets/d/1YzHs_JSYPWYqnmkM7ccQc1WZSzGP7UsgxdBF-h5LcEA/edit?usp=sharing|Student-Lists anf Project groups]]. On Teams you can find instructions for GroupID +  * [09.09.2022he lectures will be only in presence and will NOT be live-streamedbut recordings of the lecture or of the previous years will be made available here for non-attending students
-  * [06.09.2021] The first lecture of this course will take place on Thursday, 16 Sept 2021+ 
-  * [08.09.2021]People that intend to attend the course online should use this link: https://teams.microsoft.com/l/team/19%3aWKvq4kg0XbKZ5pEeiZcarbBXPCYsTvTwMkKZs2PWiHA1%40thread.tacv2/conversations?groupId=aea1385b-6721-4d90-a169-c97f7d066eca&tenantId=c7456b31-a220-47f5-be52-473828670aa1    +
 ====== Learning Goals ====== ====== Learning Goals ======
      * Fundamental concepts of data knowledge and discovery.      * Fundamental concepts of data knowledge and discovery.
Linea 93: Linea 95:
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Wednesday |  14:00 - 16:00  |  Room C  - Online  |  +|  Wednesday |  09:00 - 11:00  |  Room  |  
-|  Thursday  |  14:00 - 16:00  |  Room C  - Online  |  +|  Thursday  |  11:00 - 13:00  |  Room C  |  
-|  Friday    |  09:00 - 11:00  |  Room A1 -  Online  +|  Friday    |  09:00 - 11:00  |  Room  
  
  
  
 **Office hours - Ricevimento:** **Office hours - Ricevimento:**
-Anna Monreale: Wednesday: 11:00-13:00 online using Teams (Appointment by email) +Anna Monreale: Tuesday: 11:00-13:00 by online using Teams or at the Department of Computer Science, room 374/E (Please ask an appointment by email). 
-Francesca Naretto: Monday: 15:00-18:00 online using Teams (Appointment by email)+Francesca Naretto: TDB
  
- +A **[[https://teams.microsoft.com/l/team/19%3aU9V_a8O2AkYl6KAcYiVMyOx_UfVD4SXKE2bwYRdOQ581%40thread.tacv2/conversations?groupId=ecc43c6e-29fe-4819-bc96-9bc6b906491f&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams Channel]]** will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students.  
 ====== Learning Material -- Materiale didattico ====== ====== Learning Material -- Materiale didattico ======
  
Linea 125: Linea 127:
 ===== Software===== ===== Software=====
  
-  * Python - Anaconda (3.7 version!!!): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included)+  * Python - Anaconda (at least 3.7 version!!!): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included)
   * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]]   * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]]
   * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]   * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]
  
    
-====== Class Calendar (2021/2022) ======+====== Class Calendar (2022/2023) ======
  
 ===== First Semester  ===== ===== First Semester  =====
  
 ^ ^ Day ^ Topic ^ Learning material ^ References ^ Video Lectures ^ ^ ^ Day ^ Topic ^ Learning material ^ References ^ Video Lectures ^
-|  |  15.09  14:15‑16:00 | Lecture deleted  | | | | +|1.   15.09  11:00‑13:00 | Overview. Introduction to KDD   |{{ :magistraleinformatica:dmi:1-overview.pdf |}} {{ :magistraleinformatica:dmi:1-intro-dm.pdf |}}|Chap. 1 Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EShM9RVJWTNHq_V8gj-abVgB86BMP0QX-iNMLzrrNP_sPg?e=6uigHe|Video 1: Course Overview]];[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/ESccksnIm5lFp9ROw18otfsBBhl3Ybeus-OjqqMcl2xCmQ?e=kF1KNe|Video 2: Introduction DM]] (the recording of the Introduction had some audio issue so I published the part of the lecture of the a.y. 2021/22)
-|1.|  16.09  14:1516:00 | Overview. Introduction to KDD  | {{ :magistraleinformatica:dmi:2021-1-overview.pdf |}}{{ :magistraleinformatica:dmi:1-intro-dm.pdf |}} | Chap. 1 Kumar Book| [[https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Meeting%20in%20_General_-20210916_140839-Meeting%20Recording.mp4?web=1|Video 1]]  [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Meeting%20in%20_General_-20210916_151538-Meeting%20Recording.mp4?web=1|Video 2]] | +|2.  |  16.09  09:00-11:00 | Data Understanding | {{ :magistraleinformatica:dmi:2-data_understanding.pdf |}} |Chap.2 Kumar Book and additioanl resource of Kumar Book:[[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Exploring Data]] If you have the first ed. of KUMAR this is the Chap 3 | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EXYTYX_JE4dDsqGwmBTKyBcBCkd7IxaFhUTn-MRc7LyJlA?e=gfCgnI|Video 1: Data Understanding - Part 1]][[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EeMWAfDY__BKvQ2RHadlmZsBgKCXuAV2mxjUiY3GuWK9Zg?e=7wIafq |Video 2: Data Understanding - Part 2]] | 
-|2.|  17.09   09:00-10:45 | Data Understanding | {{ :magistraleinformatica:dmi:2-data_understanding.pdf | Slides DU}} |Chap.2 Kumar Book and additioanl resource of Kumar Book:[[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Exploring Data]] If you have the first ed. of KUMAR this is the Chap 3 | [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210917_071017-Meeting%20Recording.mp4|Video 1]]  [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210917_101809-Meeting%20Recording.mp4|Video 2]] | +|3.  |  21.09  09:00-11:00 | Data Understanding Data Preparation |  {{ :magistraleinformatica:dmi:3-data_preparation.pdf |}} |Chap.2 Kumar Book and additioanl resource of Kumar Book:[[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Exploring Data]] If you have the first ed. of KUMAR this is the Chap 3 | [[ https://unipiit.sharepoint.com/:v:/s/a__td_54794/EaqvlZGIKvdMi8j8r7TIbHkB76b2K8gMsEPVtDouO5waYw?e=IUYEb6 
-|3.|  22.09  14:15-16:00 | Data Understanding Data Preparation        | {{ :magistraleinformatica:dmi:3-data_preparation.pdf |}} | Chap. 2 Kumar Book | [[https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210922_120312-Meeting%20Recording.mp4|Video]] | +|Video: Data Understanding & Data Preparation]] | 
-|4.|  23.09  14:15-16:00 | Data Preparation + Data Similarities.|{{ :magistraleinformatica:dmi:4-data_similarity.pdf |}}       | Data Similarity is in Chap. 2  | +|4.  |  22.09   11:00-13:00 | Data Preparation + Data Similarities.|{{ :magistraleinformatica:dmi:4-data_similarity.pdf |}}       | Data Similarity is in Chap. 2  |[[ https://unipiit.sharepoint.com/:v:/s/a__td_54794/EarSFVCS5MFJnFSi9dMT2y0BJgqS_YIVLX9fenQV9GyrjQ?e=eZshby 
-|5.|  24.09  09:00-10:45 | Introduction to Clustering. Center-based clustering: kmeans| {{ :magistraleinformatica:dmi:5-basic_cluster_analysis-intro.pdf |}}  {{ :magistraleinformatica:dmi:6.1-basic_cluster_analysis-kmeans.pdf |}}     | Clustering is in Chap. 7  | +|Video 1: Data Preparation + Data Similarities - Part 1]]; [[ https://unipiit.sharepoint.com/:v:/s/a__td_54794/EQMiVHPB8hlKuZ7ntw8Km-IBe4HtW_hz5VvYefLvHDfDLQ?e=bKJOn7|Video 2: Data Preparation + Data Similarities - Part 2]]   
-|6.|  29.09  14:15-16:00 | Hierarchical clustering       | {{ :magistraleinformatica:dmi:7.basic_cluster_analysis-hierarchical.pdf |}} | Chap7 Kumar Book  +|5.  |  23.09  09:00-11:00 | Introduction to Clustering. Center-based clustering: kmeans| {{ :magistraleinformatica:dmi:5-basic_cluster_analysis-intro.pdf |}}  {{ :magistraleinformatica:dmi:6.1-basic_cluster_analysis-kmeans.pdf |}} | Clustering is in Chap. 7  |[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EV-fDd75MIxGmazA79kFHCYBI78yYwqy7AFE5h9MN2rRqg?e=YVgdjS|Video 1: Introduction to Clustering + K-means - Part 1]];[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/ETySd1UWIzxCoAKilzaXO_MBW8oXZZCjf5FEhyywGIdJBg?e=Xq2jdo|Video 2: Introduction to Clustering + K-means - Part 2]]] 
-|7.|  30.09  14:15-16:00 | Density based clustering. Clustering validity. Lab. DU | {{ :magistraleinformatica:dmi:8.basic_cluster_analysis-dbscan-validity.pdf |}}  {{ :undefined:dataund.zip | Notebook DU tips}} {{ :magistraleinformatica:dmi:adult_du.zip Another Notebook on DU}}  Chap7 Kumar Book  | +|6.  |  28.09  09:00-11:00 | Python Lab: Data Understanding & Data Preparation |{{ :undefined:dataund.zip Notebook DU tips}} | | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/Eb0GU3ScaudIuh3kmNVw5_EBvgFRME5hnkOyZCetW55vwg?e=fDgxnE|Video 1: Python Lab: DU - Part1]];[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EfTSxKrV45lIhWCySWDflM4BXjD7WSKj6X3Se5Dv7UEb2Q?e=vf2E2h|Video 2: Python Lab: DU - Part2]]
-|8.|  01.10  09:00-10:45 Python Lab - Clustering|  {{ :magistraleinformatica:dmi:tips_clustering.ipynb_complete.zip Notebook CLustering Tips}}   |   | +|7.  |  29.09  11:00-13:00 | Hierarchical clustering | {{ :magistraleinformatica:dmi:7.basic_cluster_analysis-hierarchical.pdf |}}| | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EcBupaEx_HNFiEMYa_dQ2m8BNe25abzcmTKZ3JrVOtibCQ?e=SSgN9T| VideoProject Description + Hierarchical Clustering]]| 
-|9.|  06.10  14:15-16:00 | Center-based clustering: Bisecting K-means, Xmeans, EM       | {{ :magistraleinformatica:dmi:6.2-basic_cluster_analysis-kmeans-variants.pdf |}} | Chap. 7 Kumar Book, {{ :magistraleinformatica:dmi:clusteringmixturemodels.pdf |}} {{ :magistraleinformatica:dmi:xmeans.pdf |}}|  +|  |  30.09  09:00-11:00 | Lecture Canceled | | | 
-|10.| 07.10  14:15-16:00 | Classification Problem. Decision Trees  |{{ :magistraleinformatica:dmi:9.chap3_basic_classification-2020.pdf |}} | Chap. 3 Kumar Book| +|8.  |  05.10  09:00-11:00 Density based clustering. Clustering validity. | {{ :magistraleinformatica:dmi:8.basic_cluster_analysis-dbscan-validity.pdf |}} | Chap. 7 Kumar Book  |   | 
-|   | 08.10  09:00-10:45 | Lecture canceled   |     +|9.|   06.10  11:00-13:00 | Center-based clustering: Bisecting K-means, Xmeans, EM       | {{ :magistraleinformatica:dmi:6.2-basic_cluster_analysis-kmeans-variants.pdf |}} | Chap. 7 Kumar Book, {{ :magistraleinformatica:dmi:clusteringmixturemodels.pdf |}} {{ :magistraleinformatica:dmi:xmeans.pdf |}}| [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EQTbbvqF2kJOgEsFQ1WF48cBjWf2wgTCbOjxcQzn9MyVzw?e=KQ7gEZ|Video 1: Center-based clustering - Bisecting K-means, Xmeans, EM ]]; [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EYc-39UkUhlCsL_huteYf7YBJjTruY207hGwBJmBRobACg?e=ixfbJ2|Video 2: Clustering Lab.]] 
-|11.| 13.10  14:15-16:00 | Decision Trees Classifier Evaluation | same slides of the previous lecture | Chap. 3 Kumar Book|  +|10.|   07.10  09:00-11:00 |Python Lab - Clustering|  {{ :magistraleinformatica:dmi:tips_clustering.ipynb_complete.zip | Notebook CLustering Tips}}    |[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EUl3UMFUoixPo9FzX7nC9e4BWS1SFpIGPVdZbwyfdZBgCw?e=BexKor|Video: Clustering Lab. - Part2]] | 
-|12.| 14.10  14:15-16:00 | Evaluation Methods for Classification Models  same slides of the previous lecture | Chap. 3 Kumar Book|  +|11.|   12.10  09:00-11:00 |Classification Problem. Decision Trees|  {{ :magistraleinformatica:dmi:9.chap3_basic_classification-2022.pdf |}}  | Chap. 3 Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EWVzwCQWC7BKmRlG69Regg4BEmeqRwin9GZ0VJIcV_wtsw?e=YDkrrfVideo Lecture - Part1]]; [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EQUgcfNC2XdPn6QHy3rlgJcBO9UTsgmoMYFJbhP8vM5UIA?e=uWDpVA|Video Lecture - Part 2]]
-|13.| 15.10  09:00-10:45   Statistical tool for model evaluation + Rule based classification| {{ :magistraleinformatica:dmi:10-rule-based-clussifiers.pdf |}} |  Chap. 3 Kumar Book +  Chap. 4 Kumar Book| |  +|12.|   13.10  11:00-13:00 |Decision Trees Classifier Evaluation|  same slides previous lecture | Chap. 3 Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EaJcTBiLgh1DiErGVCZAovoBlLOaHuCrabxtNOTqXYRg-A?e=IQJDGE|Video Lecture - Part 1]] [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EbXLwpQCdNNMt4ClRGPAivMBFJa-qoH8N9TKpp9OgD8mlw?e=TjOAsx|Video lecture - Part 2]]
-|14.| 20.10  14:15-16:00 | Rule based classification + Instance-based Classification  | {{ :magistraleinformatica:dmi:10-knn.pdf |}} | Chap4 Kumar Book +|13.|   14.10  09:00-11:00 |Classifier Evaluation same slides previous lecture | Chap. 3 Kumar Book |[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EXT98sRBqL9Dpf-pAzfUnkIBy2zDxh86kI2D8ouBXH1zxQ?e=gw3zzM|Video Lecture]] 
-|15.| 21.10  14:15-16:00 | Exercise on DT learning + Naive Bayesian Classifier | {{ :magistraleinformatica:dmi:11_2021-naive_bayes.pdf |}} {{ :magistraleinformatica:dmi:2021-dt-ex.pdf |}} | Chap. 4 Kumar Book|  +|14.|   19.10  09:00-11:00 |Rule based Classifiers  {{ :magistraleinformatica:dmi:10-rule-based-clussifiers-2022.pdf |}}{{ :magistraleinformatica:dmi:10-knn-2022.pdf |}} | Rule based classifiers: Chap. 5.1, KNN: Chap. 4.2 - Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EYWGSnZlI1BMr8CO6QZDMJEBdKQhL5_GAx0YqMAgR-49Fg?e=JSvQPJ|Video 1: Rule based classifiers]][[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EYWGSnZlI1BMr8CO6QZDMJEBdKQhL5_GAx0YqMAgR-49Fg?e=1qMLMf|Video 2: KNN]] 
-|16.| 22.10  09:00-10:45 SVM & Ensemble Classifiers | {{ :magistraleinformatica:dmi:14_svm_2020.pdf |}} {{ :magistraleinformatica:dmi:13_ensemble_2020.pdf |}} |  Chap4 Kumar Book| |  +|15.|   20.10  11:00-13:00 |DT simulation of the learning algorithm  {{ :magistraleinformatica:dmi:2021-dt-ex.pdf | DT Exercise}}|  | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EaFgtNLiCONAnV6NF7DpZioB8W09o2HQcxJsSzvm65Pb_w?e=SvNSgj|Video 1: DT-EX]]; [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EaOBQeDA_QtLtRYQVHoDFgABilFcAgpYsm01aUdUNjvBGA?e=bFpIpz|Video 2: DT-EX]]
-|17.| 27.10  14:15-16:00 | Neural Networks   {{ :magistraleinformatica:dmi:15_neural_networks_2021.pdf |}}| Chap4 Kumar Book +|16.|   21.10  09:00-11:00 |Naive Bayesian Classifier. SVM. Ensemble Classifiers | {{ :magistraleinformatica:dmi:11_2022-naive_bayes.pdf |}} {{ :magistraleinformatica:dmi:14_svm_2022.pdf |}} {{ :magistraleinformatica:dmi:13_ensemble_2022.pdf |}}| Chap. 4 Kumar Book |[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EX8DmHrg7d5HgaOstPoOjNcBwtfdTk9vbNoCeLQZSAWXYA?e=l3p65j|Video1]]; [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EeVLe8Gr5-lFtAwfPipFTSQB95Wo0HzeLvo2O9aAN3a8_w?e=V2EmcA|Video2]]
-|18.| 28.10  14:15-16:00 | Python Lab on Classification | {{ :magistraleinformatica:dmi:adult_classification_2021.ipynb.zip |}} | |  | +|17.|   26.10  09:00-11:00 |Ensemble Classifiers + NN Classifiers + Project Discussion| same slides of the previous lecture  | Chap. 4 - Kumar Book  | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EZWw_m9HxzlKlIss5vsPeIgBl70TAqJW_I0j5x6rtkTOZg?e=kPTZFu|Video1]]| 
-|19.| 29.11  09:00-10:45 Canceled    | |  +|18.|   27.10  11:00-13:00 | NN Classifiers + Python Lab: Classification| {{ :magistraleinformatica:dmi:15_neural_networks_2021.pdf |}} {{ :magistraleinformatica:dmi:adult_classification_2021.ipynb.zip Classificaton Notebook}} {{ :magistraleinformatica:dmi:adult.data.zip Adult Dataset}}  | | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EVt50MbiGTpKsiO_t7WIeP0B8_ocGvnD7zEeUyRQ_d5wwQ?e=squ80v|Video1]]; [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EcN1nMk11VBKgLbOg-2KWakBStYobApkO45gI9FZg1E_2Q?e=WP2MqH|Video2]]
-|20.| 03.11  14:15-16:00 | Python Lab on Classification + Association Rule Mining  | {{ :magistraleinformatica:dmi:classificationpython2.zip |}} {{ :magistraleinformatica:dmi:17_association_analysis2021.pdf |}} | Chap.5 Association Rules: Kumar Book|  +|19.|   28.10  09:00-11:00 |Python Lab: Classification | {{ :magistraleinformatica:dmi:adult_classification_2021.ipynb.zip Classificaton Notebook}} (same as previous lecture)  | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EVBxay2OtPhLmVgn658oHqEBMyVzNKp2Ju5aMq3ecTrOZw?e=e2uONt|Video]] 
-|21.| 04.11  14:15-16:00 | Association Rule Mining |   | |  Chap.5 Association RulesKumar Book+|20.|   02.11  09:00-11:00 |Python Lab: NN & Imbalanced Classification | {{ :magistraleinformatica:dmi:classificationpython2.zip |}} |  Unfortunately Video is not available for technical issues  | 
-|22.| 05.11  09:00-10:45 FP-Growth Sequential Pattern Mining | {{ :magistraleinformatica:dmi:17_2021-fp-growth.pdf |}} |   Chap.6 Kumar Book|  +|21.|   03.11  11:00-13:00 Association Rule Mining{{ :magistraleinformatica:dmi:17_association_analysis2021.pdf |}} Chap. 5 - Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EWKi0iWArRhAgi8T5pKNsiMB5llNPPF5xnvsnTKmw3c97Q?e=6whkRx|Video]] 
-|23.| 10.11  14:15-16:00 | Sequential Pattern Mining |  {{ :magistraleinformatica:dmi:18_sequential_patterns_2021.pdf |}} |Chap.7 Kumar Book |  | +|22.|   04.11  09:00-11:00 | FP-Growth - Sequential Pattern Mining | {{ :magistraleinformatica:dmi:17_2021-fp-growth.pdf |}} {{ :magistraleinformatica:dmi:18_sequential_patterns_2021.pdf |}}|Chap. 5 &  Chap. 6 - Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EajQMV7eT3VOnZzh698x08UBKIMNqNdpOFep_1l-43Iprw?e=hR8G5L|Video1]];[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EfZbzQAqsCBFrHP4MHpx5twBKsoOhZtF6GJ7dKAaYxxWzg?e=ceNFjT|Video2]] 
-|24.| 11.11  14:15-16:00 | Time Series Similarities, Transformations Clustering | {{ :magistraleinformatica:dmi:22_time_series_similarity_2021.pdf |}}  | [[https://cs.gmu.edu/~jessica/BookChapterTSMining.pdf|Overview on DM for time series]]|  | +|23.|   09.11  09:00-11:00 | Sequential Pattern Mining. Intro to Time Series|Slides on SPM (see previous lecture) |  |  [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/ERRAWko36o1KroWob_cbUMoBZ7wgxVU3NbQK_Yz-jZadog?e=sbOrJR|Video1]];[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EQy8x2XPoEdBgq-bZWa6DdoBLGwHiWm-4xuzkMuZaGDaRg?e=8iIh7e|Video2]] | 
-|25.| 12.11  09:00-10:45 Motif & Shapelet Discovery | {{ :magistraleinformatica:dmi:23_time_series_motif-2021.pdf |}} | {{ :magistraleinformatica:dmi:matrixprofile.pdf |}}  | {{ :magistraleinformatica:dmi:shaplet.pdf |}} | +|24.|   10.11  11:00-13:00 Time Series Similarities{{ :magistraleinformatica:dmi:22_time_series_similarity_2022.pdf |}}  | [[https://cs.gmu.edu/~jessica/BookChapterTSMining.pdf| Overview on Time Series]] | [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/ER3cHQ0J4YBAsZqo0xzOz4UBefOxnjGELu9SuuMTfo95Eg?e=00wnp0|Video]] 
 +|25.|   11.11  09:00-11:00 Time Series Transformations Clustering Classification Slides on transformations (previous lecture) {{ :magistraleinformatica:dmi:23_time_series_motif-2022_2.pdf |}}| |[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/Eff6vI08BjNKkDogZSY6p2kBVNrltI4bZM1d-d8TE0bCdw?e=eqbHKc|Video]]| 
 +|26.|   18.11  09:00-11:00 Shapelets & MotifLab: Association Rules|  Slides on shapelets & motif (previous lecture) {{ :magistraleinformatica:dmi:arm-spm.zip |}} | {{ :magistraleinformatica:dmi:matrixprofile.pdf |}}  [[https://www.cs.ucr.edu/~eamonn/MatrixProfile.html|Papers on Matrix Profile]]{{ :magistraleinformatica:dmi:shaplet.pdf |}}|[[https://unipiit.sharepoint.com/:v:/s/a__td_54794/ETW4Mo2LBm9CkSPZljVqplsBxOpJuXJxA5nugiKh-eRZ4Q?e=zjMzVY |Video 1: Shapelets & Motif]]; [[https://unipiit.sharepoint.com/:v:/s/a__td_54794/EaH053-TvQpAkdacrfGUDxYBtVPbFdNYJ8IMyhKyKpK3uA?e=xUYRB6|Video 2: Lab ARM]] 
 +|27.| 23.11  09:00-11:00 | Python: Sequential Pattern Mining & Time Series For SPM see notebooks of previous lecture. {{ :magistraleinformatica:dmi:timeseries-py.zip |}}|  |  [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EdPYuw_W-y9In9AeoHJm1a0BdBeUYhPb6z_3VFZDR9TMCQ?e=bAHAC1|Video]]
 +|28.| 24.11  11:00-13:00 | Python: Time Series. Ethics Privacy| {{ :magistraleinformatica:dmi:19_ethics_privacy2021.pdf |}}  | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EQCiARRNIYtNsvZCucPyX30BvP-KDZBXEMcpYFVeS107eQ?e=Jor8lE|Video 1]]; [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EeM4qF4IbRVHrQEyEJFhfRYB3HnapYKARglqiZdGfBgEGA?e=1ah2IVVideo 2]]| 
 +|29.| 25.11  09:00-11:00 | Privacy  | same slides off the last lecture |  | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/ETMsw1ltw1RMhxXTIRLrb18BTNOx5koRTvQllhM295EDgg?e=wEgRnp|Video]]
 +|30.| 30.11  09:00-11:00 |Explainability | {{ :magistraleinformatica:dmi:20_explainability_2021.pdf |}} |  | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EQ9prxzw6wtOtVooByStG9oBgoK8cqvnSBQmeS2-FxV9jA?e=hCH48N|Video]]| 
 +|31.| 01.12  11:00-13:00 |Anomaly Detection + Python: XAI | {{ :magistraleinformatica:dmi:lezione-xai.zip |XAI Notebook}}  | Note: unfortunately the Video on the lecture on AD does not work. You can only hear my voice but the vieo is not available. Sorry.  [[ https://unipiit.sharepoint.com/:v:/s/a__td_54794/ES2D7eCwWGxOnNwoPvVTG9oBqc3w3yWZ04uF-wgrIVjp5w?e=Z9YbWU|  Video - AD - Only audio]][[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EeByA4yoIJBBnXbPe1Lf9EEBBwlwEi_FTKNqJr7MtZ6bkg?e=aFTe7u|Video Python XAI]]| 
 +|32.| 02.12  09:00-11:00 |Python: XAI + AD| {{ :magistraleinformatica:dmi:anomalydetection-1.ipynb.zip Anomaly Detection Notebook}}|  | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EUuzBGfYmYVCqUDk-1GkWvsBWL_QiaRRjUUu5Yj4YLGr7Q?e=VDFf7x|Video]]| 
 +|33.| 07.12  09:00-11:00 |Paper Presentation|  |  | | 
 +|34.| 09.12  09:00-11:00 |Paper Presentation|  |  | | 
 +|35.| 14.12  09:00-11:00 |Paper Presentation|  |  | | 
 +|36.| 15.12  11:00-13:00 |Paper Presentation|  |  | | 
 + 
 + 
 + 
 + 
 + 
  
 ====== Exams ====== ====== Exams ======
-**Mid-term Project **+**Project **
  
 A project consists in data analyses based on the use of data mining tools.  A project consists in data analyses based on the use of data mining tools. 
-The project has to be performed by a team of 2/3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The students must deliver both: paper (single column) and  well commented Python Notebooks.+The project has to be performed by a team of 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and  well commented Python Notebooks.
  
-  * First part of the project consists in the **assignments** described here: {{ :magistraleinformatica:dmi:data_mining_project_2021_1_.pdf | Project Description}} +  * First part of the project consists in the **assignments** described here: {{ :magistraleinformatica:dmi:projectdescriptiondm2022-new.pdf | Project Description}} 
-     * **Dataset:** {{ :magistraleinformatica:dmi:prj_data.zip Dataset}} +  **Dataset:[[https://unipiit.sharepoint.com/:u:/s/a__td_54794/ERsHd0L8ZbtCvAjWsHdzmfkBb-B2EvkiQLU09e22b0xsTQ?e=VfSaNW|Twitter Data]]**  
-     * **Deadline**: the fist part has to be delivered within  November<del>5th 2021</del> 15th 2021.Send an email to: anna.monreale@unipi.it and francesca.naretto@sns.it +  - **Deadline**: the fist part has to be delivered within  November <del>5th 2022</del> 12, 2022. Send an email to: anna.monreale@unipi.itfrancesca.naretto@sns.it, lorenzo.mannocci@phd.unipi.it
    
 +  * Second part of the project consists in the assignment described here: {{ :magistraleinformatica:dmi:project_description_dm2022-updated.pdf |Updated Project Description}}
 +     - **Deadline**: Jan 8, 2023
 +
 +  * Third part of the project consists in the assignment described here: {{ :magistraleinformatica:dmi:project_description_dm2022-complete.pdf |Complete Project Description}}
 +   - Note that the document contains also rules for the delivery and final exam!
 +   - **Deadline**:   Jan 8, 2023
 +
 +
 +**Students who did not deliver the above project within **Jan 8, 2023** need to ask by email a new project to the teachers. The project that will be assigned will require about 2 weeks of work and after the delivery it will be discussed during the oral exam. **
 +
 ** Paper Presentation (OPTIONAL)** ** Paper Presentation (OPTIONAL)**
  
-Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions. They only need to present the project (see next point).+Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions. They only need to present the project (see next point). The paper presentation can be done by the group or by a single person.
  
 **Oral Exam** **Oral Exam**
-  * **Project presentation** (with slides) – 10 minutes: mandatory for all the students+  * **Project presentation** (with slides) – 10-15 minutes: mandatory for all the students
   * ** Open questions ** on the entire program: optional only for students opting for paper presentation.   * ** Open questions ** on the entire program: optional only for students opting for paper presentation.
   
Linea 184: Linea 214:
  
  
-===== Reading About the "Data Scientist" Job ===== 
  
-** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the "sexiest" around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them. **+====== Previous years ===== 
 +[[DM-INF 2021-2022]]
  
-//Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// 
- 
-  * Data, data everywhere. The Economist, Feb. 2010 {{:dm:economist--010.pdf|download}} 
-  * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http://tech.fortune.cnn.com/2011/09/06/data-scientist-the-hot-new-gig-in-tech/|link]] 
-  * Welcome to the yotta world. The Economist, Sept. 2011 {{:dm:economist-2012-dm.pdf|download}} 
-  * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1|link]] 
-  * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http://www.ilsole24ore.com/art/tecnologie/2012-09-21/futuro-scritto-data-155044.shtml?uuid=AbOQCOhG|link]] 
-  * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}} 
-  * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]] 
- 
-  * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]] 
- 
-====== Previous years ===== 
 [[DM-INF 2020-2021]] [[DM-INF 2020-2021]]
  
magistraleinformatica/dmi/start.txt · Ultima modifica: 22/03/2024 alle 20:34 (4 settimane fa) da Anna Monreale