Strumenti Utente

Strumenti Sito


dm:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
dm:start [12/04/2021 alle 13:40 (3 anni fa)]
Riccardo Guidotti [Second Semester (DM2 - Data Mining: Advanced Topics and Applications)]
dm:start [26/03/2024 alle 17:16 (47 ore fa)] (versione attuale)
Riccardo Guidotti [Second Semester (DM2 - Data Mining: Advanced Topics and Applications)]
Linea 9: Linea 9:
 ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true});
 ga('personalTracker.require', 'linker'); ga('personalTracker.require', 'linker');
-ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); +ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it', 'luciacpassaro.github.io'] );    
-  +
 ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.require', 'displayfeatures');
-ga('personalTracker.send', 'pageview', 'ruggieri/teaching/dm/');+ga('personalTracker.send', 'pageview', 'courses/dm/');
 setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000);  setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000); 
 </script> </script>
Linea 51: Linea 50:
 </script> </script>
 </html> </html>
-====== Data Mining A.A. 2020/21 ======+====== Data Mining A.A. 2023/24 ======
  
 ===== DM1 - Data Mining: Foundations (6 CFU) ===== ===== DM1 - Data Mining: Foundations (6 CFU) =====
Linea 61: Linea 60:
     * [[dino.pedreschi@unipi.it]]       * [[dino.pedreschi@unipi.it]]  
  
-  * **Mirco Nanni** +  * **Riccardo Guidotti** 
-    * KDDLab, ISTI - CNR, Pisa +    * KDDLab, Università di Pisa 
-    * [[http://www-kdd.isti.cnr.it]] +    * [[https://kdd.isti.cnr.it/people/guidotti-riccardo]]    
-    * [[mirco.nanni@isti.cnr.it]]  +    * [[riccardo.guidotti@di.unipi.it]]
  
 Teaching Assistant Teaching Assistant
-  * **Salvatore Citraro**+  * **Andrea Fedele**
     * KDDLab, Università di Pisa     * KDDLab, Università di Pisa
-    * [[http://www-kdd.isti.cnr.it]] +    * [[https://www.linkedin.com/in/andrea-fedele/?originalSubdomain=it]] 
-    * [[salvatore.citraro@phd.unipi.it]]  +    * [[andrea.fedele@phd.unipi.it]]  
 ===== DM2 - Data Mining: Advanced Topics and Applications (6 CFU) ===== ===== DM2 - Data Mining: Advanced Topics and Applications (6 CFU) =====
  
Linea 79: Linea 78:
     * [[riccardo.guidotti@di.unipi.it]]     * [[riccardo.guidotti@di.unipi.it]]
  
 +Teaching Assistant 
 +  * **Andrea Fedele** 
 +    * KDDLab, Università di Pisa 
 +    * [[https://www.linkedin.com/in/andrea-fedele/?originalSubdomain=it]] 
 +    * [[andrea.fedele@phd.unipi.it]]   
 +    * Meeting: https://calendly.com/andreafedele/
 ====== News ====== ====== News ======
-    * ** [08.04.2021CAT2 is available {{ :dm:cat2_2021.pdf | here}}.** + 
-    * **[06.04.2021The project must be delivered to [[riccardo.guidotti@unipi.it]] AND [[salvatore.citraro@phd.unipi.it]with subject "[DM2 Project] Draft 1"** +     * **[19.01.2024]** DM2 Lectures will start on Mon 19/02, only for that lecture the time will be 14-16 instead of 9-11. 
-    * [08.03.2021CAT1 answers are available {{ :dm:cat1_2021_answ.pdf | here}}+     * [13.10.2023To schedule meeting with the Teaching Assistant you can use: https://calendly.com/andreafedele/ 
-    * [01.03.2021CAT1 is available {{ :dm:cat1_2021.pdf | here}}+     [20.09.2023Recordings of the lectures can be found on the web pages of the course for the years 2020/2021 and 2021/2022 (see links at the bottom of this page) 
-    * [15.02.2021] Groups should be registered [[https://docs.google.com/spreadsheets/d/1RaAocJ2bCjCOYj4R068Rg6OLNNVmIG8YXu9puGIC1OU/edit?usp=sharing|here]] +     [20.09.2023Thursday 21 September there will be no lecture. 
-    * [11.02.2021The course will be held online on  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]].  +     * [11.09.2023Lectures will start on Monday 18 September 2023 at 11.00 room C1
-    * [11.02.2020] The first lesson will be held on 15/02/2021.+     * [11.09.2023Lectures will be in presence onlyRegistrations of the lectures of past years can be found at the bottom of this web page
 +     * [11.09.2023Project Groups [[https://docs.google.com/spreadsheets/d/10R5AcqdlXsqTAxSys6zyqArvdytq4HH6Ik8Uy-NHkQ4/edit?usp=sharing|link]] 
 +     * [11.09.2023MS Teams [[https://teams.microsoft.com/l/team/19%3a7uEgK_aekrBFuOsbREccAa-tfqeSwvfBemfK_lG6HA01%40thread.tacv2/conversations?groupId=84cc4fec-41fc-4208-a9d4-a02675216d22&tenantId=c7456b31-a220-47f5-be52-473828670aa1|link]] 
 ====== Learning Goals ====== ====== Learning Goals ======
   * DM1   * DM1
Linea 96: Linea 102:
      * Classification      * Classification
      * Pattern Mining and Association Rules      * Pattern Mining and Association Rules
-     Clustering+     Sequential Pattern Mining
  
   * DM2   * DM2
      * Outlier Detection      * Outlier Detection
-     * Regression and Forecasting +     * Dimensionality Reduction 
-     * Advanced Classification+     * Regression  
 +     * Advanced Classification and Regression
      * Time Series Analysis      * Time Series Analysis
-     * Sequential Pattern Mining 
-     * Advanced Clustering 
      * Transactional Clustering      * Transactional Clustering
-     Ethical Issues+     Explainability
  
 ====== Hours and Rooms ====== ====== Hours and Rooms ======
Linea 115: Linea 120:
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Monday  |  14:00 - 16:00  |  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |  +|  Monday  |  11:00 - 13:00  |  C1   |  
-|  Wednesday  |  16:00 - 18:00  |  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  +|  Wednesday  |  11:00 - 13:00  |  C1  
  
 **Office hours - Ricevimento:** **Office hours - Ricevimento:**
  
-  * Prof. PedreschiMonday 16:00 - 18:00Online +  * Prof. Pedreschi 
-  * Prof. Nanniappointment by email, Online+      * Monday 16:00 - 18:00 
 +      * Online 
 +  * Prof. Guidotti 
 +      * Tuesday 16:00 - 18:00 or Appointment by email 
 +      * Room 363 Dept. of Computer Science or MS Teams
  
      
Linea 130: Linea 139:
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Monday  |  14:00 - 16:00  |  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |  +|  Monday   |  09:00 - 11:00  |  C   |  
-|  Wednesday  |  16:00 - 18:00  |  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |  +|  Wednesday  |  11:00 - 13:00  |   |  
  
 **Office Hours - Ricevimento:** **Office Hours - Ricevimento:**
  
-  * Room 268 Dept. of Computer Science +  * Tuesday 15.00-17.00 or Appointment by email 
-  * Tuesday: 15-17, Room: MS Teams +  * Room 363 Dept. of Computer Science or MS Teams
-  * Appointment by email+
  
 ====== Learning Material -- Materiale didattico ====== ====== Learning Material -- Materiale didattico ======
Linea 145: Linea 153:
   * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006   * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006
     * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]]     * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]]
-    * I capitoli 46sono disponibili sul sito del publisher. -- Chapters 4,and are also available at the publisher's Web site.+    * I capitoli 35sono disponibili sul sito del publisher. -- Chapters 3,and are also available at the publisher's Web site.
   * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7   * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7
   * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition.   * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition.
Linea 159: Linea 167:
 ===== Software===== ===== Software=====
  
-  * Python - Anaconda (3.7 version!!!): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included)+  * Python - Anaconda (>3.7): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included)
   * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]]   * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]]
   * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]   * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]
 +
 +Other softwares for Data Mining
   * [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]]   * [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]]
   * [[http://www.cs.waikato.ac.nz/ml/weka/ | WEKA ]] Data Mining Software in JAVA. University of Waikato, New Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ | Download page ]]   * [[http://www.cs.waikato.ac.nz/ml/weka/ | WEKA ]] Data Mining Software in JAVA. University of Waikato, New Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ | Download page ]]
-  * Didactic Data Mining [[http://matlaspisa.isti.cnr.it:5055/DDM]]+  * Didactic Data Mining [[http://matlaspisa.isti.cnr.it:5055/HelpDDMv1]], [[https://kdd.isti.cnr.it/ddm/#/| DDMv2]] 
    
-====== Class Calendar (2020/2021) ======+====== Class Calendar (2023/2024) ======
  
 ===== First Semester (DM1 - Data Mining: Foundations) ===== ===== First Semester (DM1 - Data Mining: Foundations) =====
  
-^ ^ Day ^ Room ^ Topic ^ Learning material Instructor +^ ^ Day ^ Time ^ Room ^ Topic ^ Material Lecturer 
-|1.|  16.09.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Introduction| {{ :dm:1.dm-overview-corso.pdf | Course Overview}} {{ :dm:2.introduction-short.pdf | Introduction DM}} | Pedreschi | +|01.| 18.09.2023 11-13 |C1Overview, Introduction | {{ :dm:00_dm1_introduction_2023_24.pdf | Intro}} | Pedreschi| 
-|2. 23.09.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Data Understanding | {{ :dm:3.dataunderstanding-2019.pdf |Slides DU}} {{ :dm:2-statistica_descrittiva.pdf |Slides on Descriptive Statistics}} | Pedreschi +  20.09.2023 | 11-13 |  | No Lecture |  |  | 
-|3.|  28.09.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Data Understanding |  | Pedreschi +|02.| 25.09.2023 | 11-13 |C1LabIntroduction to Python | {{ :dm:dm1_lab01_python_basics.zip Python Basic}} | Guidotti
-|4.|  30.09.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Data Preparation  | {{ :dm:3.dm_ml_data_preparation.pdf | Slides DP}} | Pedreschi +|03.| 27.09.2023 | 11-13 |C1| Lab. Data Understanding | {{ :dm:dm1_lab02_data_understanding.zip | Data Understanding}} Guidotti
-|5.|  05.10.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Lab: Introduction to Python and Knime | {{ :dm:python_basics.ipynb.zip |Python Introduction}}, {{ :dm:00_start_with_knime.zip Knime simple workflow}}  [[https://web.microsoftstream.com/video/97b6fb6f-8909-417c-bc41-2f1e5a9ab00eLecture 5 part 1]], [[https://web.microsoftstream.com/video/2179e7ad-e1b7-48f6-a7cf-dc903a97f8dc|Lecture 5 part 2]]| Guidotti, Citraro +|04.| 02.10.2023 11-13 |C1| Data Understanding | {{ :dm:01_dm1_data_understanding_2023_24.pdf | Data Understanding}} | Guidotti
-|6.|  07.10.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Lab: Data Understanding Preparation Dataset: {{ :dm:iris.csv.zip Iris}}, {{ :dm:titanic.csv.zip Titanic}}, Knime: {{ :dm:01_data_understanding.zip |}} Python: {{ :dm:titanic_data_understanding2.ipynb.zip |}} [[https://web.microsoftstream.com/video/c328a4e7-40d3-4378-a0c7-d5c24400d59a|Lecture 6 part 1]], [[https://web.microsoftstream.com/video/0a95d217-9cfb-46d6-aab6-e4a7fc65eac6|Lecture 6 part 2]]| Guidotti, Citraro +|05.| 04.10.2023 11-13 |C1Data Understanding & Preparation | {{ :dm:01_dm1_data_understanding_2023_24.pdf Data Understanding}}, {{ :dm:02_dm1_data_preparation_2023_24.pdf Data Preparation}} | Pedreschi
-|7.|  12.10.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Clustering: Intro & K-means | {{ :dm:basic_cluster_analysis-intro-kmeans_2020.pdf |Slides clustering 1}} | Nanni | +|06.| 09.10.2023 11-13 |C1| Data Preparation Data Similarity | {{ :dm:02_dm1_data_preparation_2023_24.pdf Data Preparation}}, {{ :dm:03_dm1_data_similarity_2023_24.pdf Data Similarity}} | Pedreschi| 
-|8.|  14.10.2020  16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Clustering: Hierarchical methods | {{ :dm:basic_cluster_analysis-hierarchical_2020.pdf |Slides clustering 2}} | Nanni +|07.| 11.10.2023 | 11-13 |C1| Data Similarity & Lab. Data Understanding | {{ :dm:03_dm1_data_similarity_2023_24.pdf Data Similarity}}{{ :dm:dm1_lab02_data_understanding.zip | Data Understanding}} | Pedreschi
-|9.|  19.10.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Clustering: Density-based methods and exercises | {{ :dm:6.basic_cluster_analysis-dbscan.pdf |Slides clustering 3}}, {{ :dm:ex._clustering_2020.pdf |Clustering exercises}} | Nanni +|08.| 16.10.2023 11-13 |C1Introduction to ClusteringK-Means | {{ :dm:04_dm1_clustering_intro_2023_24.pdf | Intro_Clustering}}{{:dm:05_dm1_kmeans_2023_24.pdf | K-Means }} | Pedreschi
-|10.|  21.10.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Clustering: Validation methods and exercises | {{ :dm:6.basic_cluster_analysis-validity_2020.pdf |Slides clustering 4}} | Nanni +|09.| 18.10.2023 11-13 |C1| Clustering Validation, Hierarchical Clustering | {{ :dm:04_dm1_clustering_intro_2023_24.pdf | Intro_Clustering}}, {{ :dm:06_dm1_hierarchical_clustering_2023_24.pdf | Hierarchical}} | Pedreschi
-|11.|  26.10.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | LabClustering | {{ :dm:knime_clustering.zip | Knime }}, {{ :dm:python_clustering-iris.zip |Python Iris}} {{ :dm:titanic_clustering.ipynb.zip | Python Titanic}}   | Citraro +|10.| 23.10.2023 11-13 |C1Density-based Clustering | {{ :dm:07_dm1_density_based_2023_24.pdf | Density-based Clustering}} | Pedreschi
-|12.|  28.10.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Classification: Intro and Decision Trees  | {{ :dm:7.chap3_basic_classification-2019.pdf |Slides classification}} | Nanni +|11.| 25.10.2023 11-13 |C1| LabClustering | {{ :dm:dm1_lab03_clustering.zip | Clustering}}| Guidotti
-| |  02.11.2020  14:00-16:00 |  | No Lecture. Project Week. | | +|12.| 30.10.2023 11-13 |C1Ex. Clustering | {{ :dm:ex1_dm1_clustering_2023_24.pdf | ExClustering}}| Guidotti
-|  04.11.2020  16:00-18:00 |  | No Lecture. Project Week. | | +  01.11.2023 | 11-13 |  | No Lecture |  |  | 
-|13.|  09.11.2020  14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  ClassificationDecision Trees/ Nanni +|13.| 06.11.2023 | 11-13 |C1| Intro Classification, kNN[[https://unipiit.sharepoint.com/sites/a__td_61280/Shared%20Documents/General/Recordings/Lecture%2006_11_2023-20231106_110052-Registrazione%20della%20riunione.mp4?web=1|(video)]] | {{ :dm:08_dm1_classification_intro_2023_24.pdf Intro_Classification}}, {{ :dm:09_dm1_knn_2023_24.pdf kNN}}| Guidotti
-|14.|  11.11.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  ClassificationDecision Trees/ Nanni +|14.| 08.11.2023 11-13 |C1Naive Bayes, Exercises | {{ :dm:10_dm1_naive_bayes_2023_24.pdf Naive Bayes}} Guidotti
-|15.|  16.11.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Classification: Decision Trees/| {{ :dm:ex-classification.pdf |Sample exercise}}  Nanni +|15.| 13.11.2023 11-13 |C1Model Evaluation | {{ :dm:11_dm1_classification_eval_2023_24.pdf | Model Evaluation}} | Guidotti
-|16.|  18.11.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Classification: Decision Trees/5 + Exercises | {{ :dm:classificazione_1.pdf |Exercises 1}}, {{ :dm:classificazione_2.pdf |Excercises 2}} Nanni +|16.| 15.11.2023 11-13 |C1Model Evaluation Exercises & Lab | {{ :dm:dm1_lab04_classification_regression.zip Classification}} | Guidotti| 
-|17.|  23.11.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Classification: KNN | {{ :dm:classification_knn.pdf |Slides}}, {{ :dm:ex_knn_dm2_exam.2017.10.30.pdf |Exercise 1 (KNN only)}}, {{ :dm:ex_knn_2020.pdf |Exercise 2}} | Nanni +|   | 20.11.2023 11-13  | No Lecture |  |  
-|18.|  25.11.2020  16:00-18:00 [[https://web.microsoftstream.com/video/15574ad9-650b-413a-818f-d76dea123f80|MS Teams]]  | Lab: Clustering  {{ :dm:knime_classification.zip | knime_classification}} {{ :dm:python_classification.zip python_classification}} {{ :dm:python_classification.rar python_classification2}}  Citraro +|17.| 22.11.2023 11-13 |C1Decision Tree Classifier | {{ :dm:12_dm1_decision_trees_2023_24.pdf | Decision Tree}} | Pedreschi| 
-|19.|  02.12.2020  16:00-18:00 [[ https://web.microsoftstream.com/video/8798715f-6eee-4754-b207-ec382ec08f21 |MS Teams]]  Pattern Association Rule Mining - Apriori algorithm for frequent itemset mining | {{ :dm:2-dm2-restructured_assoc-2020.pdf |}} | Pedreschi +|18.| 27.11.2023 | 11-13 |C1| Decision Tree Classifier | {{ :dm:12_dm1_decision_trees_2023_24.pdf | Decision Tree}} | Pedreschi
-|20.|  07.12.2020  14:00-16:00 [[ https://web.microsoftstream.com/video/f043ae04-0d5d-4f18-889e-7b7a84375481 |MS Teams]]  | Pattern & Association Rule Mining - Rule mining and evaluation, Closed and maximal itemsets, Multi-dimensional, Quantitative and Multy-level association rules  | Pedreschi | +|19.| 29.11.2023 11-13 |C1Exercises and Lab. Decision Tree Classifier | {{ :dm:dm1_lab04_classification.zip | Decision Tree}} | Guidotti| 
-|21.|  14.12.2020  14:00-16:00   | Lab Pattern Mining  | {{ :dm:pattern_knime.zip |knime_pattern}} {{ :dm:pattern_python.zip |python_pattern}} https://anaconda.org/conda-forge/pyfim, http://www.borgelt.net/pyfim.html {{ :dm:ex-frequentpatterns-ar.pdf |}} | Citraro |+|20.| 04.12.2023 | 11-13 |C1| Decision Tree Classifier, Exercises and Lab | {{ :dm:12_dm1_decision_trees_2023_24.pdf Decision Tree}} | Pedreschi
 +|21.| 06.12.2023 11-13 |C1Intro Regression Lab. Regression | {{ :dm:12_dm1_linear_regression_2023_24.pdf | Regression}}, {{ :dm:dm1_lab05_regression.zip Regression}} | Guidotti
 +|22.| 11.12.2023 11-13 |C1Into Pattern Mining and Apriori {{ :dm:13_dm1_pattern_mining_2023_24.pdf | Pattern Mining}} | Pedreschi| 
 +|23.| 13.12.2023 | 16-18 |C1Apriori & LabPattern Mining | {{ :dm:13_dm1_pattern_mining_2023_24.pdf Pattern Mining}}{{ :dm:dm1_lab06_pattern_mining.zip | Pattern Mining}}  | Pedreschi| 
 +|24.| 18.12.2023 | 11-13 |C| FP-Growth and Exercises | {{ :dm:13_dm1_pattern_mining_2023_24.pdf | Pattern Mining}} | Guidotti|
 ===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) ===== ===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) =====
  
-^ ^ Day ^ Room ^ Topic ^ Learning material Instructor ^ Recordings +^ ^ Day ^ Time ^ Room ^ Topic ^ Material Lecturer 
-|1.| 15.02.2021 14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | IntroductionCRIPS, KNN | {{ :dm:00_dm2_intro_2021.pdf | Intro}}, {{ :dm:01_dm2_crispdm_2021.pdf | CRISP}}, {{ :dm:02_dm2_knn_2021.pdf | KNN}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione-20210215_141005-Registrazione%20della%20riunione.mp4?web=1| 1stPart]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione-20210215_151106-Registrazione%20della%20riunione.mp4?web=1| 2ndPart]] +|01.| 19.02.2024 | 14-16 |COverviewRule-based Models | {{ :dm:14_dm2_intro_2023_24.pdf | Introduction}}, {{ :dm:dm2_project_guidelines_23_24.pdf | Guidelines}}, {{ :dm:15_dm2_rule_based_classifier_2023_24.pdf | Rule-based Models }} | Guidotti| 
-|2.17.02.2021 16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Performance Evaluation {{ :dm:03_dm2_performance_evaluation_2021.pdf Eval}}, {{ :dm:occupancy_data.zip | occupancy_data}}, {{ :dm:01_knn_eval.ipynb.zip | KNN_Eval_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%202-20210217_160202-Registrazione%20della%20riunione.mp4?web=1| Dataset]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%202-20210217_161131-Registrazione%20della%20riunione.mp4?web=1| Lecture]] +  21.02.2024  | | No Lecture   
-|3.22.02.2021 14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Imbalanced Learning | {{ :dm:04_dm2_imbalanced_learning_2021.pdf | ImbLearn}}, {{ :dm:02_dimensionality_reduction.ipynb.zip | DimRed_notebook}}, {{ :dm:03_perfeval_imbalance.ipynb.zip |ImbLearn_notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Lecture%203-20210222_141416-Registrazione%20della%20riunione.mp4?web=11stPart]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Lecture%203-20210222_151806-Registrazione%20della%20riunione.mp4?web=12ndPart]] +  26.02.2024  | | No Lecture |   
-|4.| 23.02.2021 16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Anomaly Detection | {{ :dm:05_dm2_maximum_likelihood_estimation_2021.pdf | MLE}}, {{ :dm:06_dm2_anomaly_detection_2021.pdf | Anomaly Detection}}, {{ :dm:04_outlier_detection.ipynb.zip | Anomaly_notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Lecture%204%20-%20DM2-20210224_160801-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Lecture%204%20-%20DM2-20210224_170545-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] +|02.| 19.02.2024 11-13 |CSequential Pattern Mining | {{ :dm:16_dm2_sequential_pattern_mining_2023_24.pdf | Sequential Pattern Mining}}, {{ :dm:GSP.zip | GSP}} | Guidotti| 
-|5.| 01.03.2021 14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Anomaly Detection | {{ :dm:06_dm2_anomaly_detection_2021.pdf | Anomaly Detection}}, {{ :dm:04_outlier_detection.ipynb.zip | Anomaly_notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%205-20210301_140023-Registrazione%20della%20riunione.mp4?web=11st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%205-20210301_144434-Registrazione%20della%20riunione.mp4?web=12nd Part]] | +|03.| 04.03.2024 9-11 |CSequential Pattern Mining | {{ :dm:16_dm2_sequential_pattern_mining_2023_24.pdf | Sequential Pattern Mining}}, {{ :dm:GSP.zip | GSP}} | Guidotti| 
-|6.03.02.2021 16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Anomaly Detection | {{ :dm:06_dm2_anomaly_detection_2021.pdf | Anomaly Detection}}, {{ :dm:04_outlier_detection.ipynb.zip | Anomaly_notebook}}, Extended Isolation Forest [[https://github.com/sahandha/eif|link]] | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%206-20210303_161521-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%206-20210303_171402-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] +|04.| 06.03.2024 11-13 |CTransactional Clustering | {{ :dm:17_dm2_transactional_clustering_2023_24.pdf | Transactional Clustering}} | Guidotti| 
-|7.| 08.03.2021 14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Naive Bayes Classifier | {{ :dm:07_dm2_naive_bayes_2021.pdf | NBC}}, {{ :dm:05_naive_bayes.ipynb.zip | NBC_notebook}}, {{:dm:nbc_ex1_miro.png?linkonlyEx1_Miro}}, {{:dm:nbc_ex2_miro.png?linkonly| Ex2_Miro}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%207-20210308_141327-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%207-20210308_151426-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] +|05.| 11.03.2024 9-11 |CTime Series Similarity | {{ :dm:18_dm2_time_series_similarity_2023_24.pdf | Time Series Similarity}}, {{ :dm:dm2_lab00_spotify.zip | TS_Load}}, {{ :dm:dm2_lab01_dist_transf.zip TS_Similarity}} | Guidotti| 
-| 10.02.2021 16:00-18:00 | Lezione sul tema “Da Pisa al Fermilab di Chicago: Viaggio verso un rivoluzionario computer quantistico” della prof.ssa Anna Grassellino | [[https://www.youtube.com/watch?v=NIJ9ko9fAoE|Link]] | Guidotti | | +|06.| 13.03.2024 11-13 |CTime Series Approximation | {{ :dm:19_dm2_time_series_clustering_approximation_2023_24.pdf | Time Series Clustering}}, {{ :dm:dm2_lab02_approx_clust.zip | TS_Approx_Clustering}} | Guidotti| 
-|8.| 15.03.2021 14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Linear and Logistic Regression, Rule-based Classifiers | {{ :dm:08_dm2_linear_logistic_regression_2021.pdf | Regression}}, {{ :dm:09_dm2_rule_based_classifier_2021.pdf | RuleBased}}, {{ :dm:06_linear_logistic_regression.ipynb.zip | Regression_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%208-20210315_141301-Registrazione%20della%20riunione.mp4?web=1| 1stPart]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%208-20210315_152032-Registrazione%20della%20riunione.mp4?web=1| 2ndPart]] +|07.| 18.03.2024 9-11 |CTime Series Clustering & Motifs| {{ :dm:20_dm2_time_series_matrix_profile_2023_24.pdf | Time Series Motifs}}, {{ :dm:dm2_lab03_motifs.zip | TS_Motifs}} | Guidotti| 
-|9.| 17.03.2021 16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Rule-based Classifiers, Support Vector Machines | {{ :dm:09_dm2_rule_based_classifier_2021.pdf | RuleBased}}, {{ :dm:07_rule_based_classifiers.ipynb.zip | RuleBased_Notebook}}, {{ :dm:10_dm2_svm_2021.pdf | SVM}}, {{ :dm:08_support_vector_machines.ipynb.zip | SVM_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%209-20210317_161210-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%209-20210317_171648-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] +|08.| 20.03.2024 11-13 |CTime Series Classification | {{ :dm:21_dm2_time_series_classification_2023_24.pdf | Time Series Classification}}, {{ :dm:dm2_lab04_classification.zip | TS_Classification}} | Guidotti| 
-|10.| 22.03.2021 14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] (Nonlinear) Support Vector Machines, Linear Perceptron | {{ :dm:10_dm2_svm_2021.pdf | SVM}}, {{ :dm:08_support_vector_machines.ipynb.zip | SVM_Notebook}}, {{ :dm:11_dm2_perceptron_2021.pdf | Linear Perceptron}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2010-20210322_141555-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2010-20210322_150937-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] +|09.| 25.03.2024 9-11 |CImbalanced Learning | {{ :dm:22_dm2_imbalanced_learning_2023_24.pdf | Imbalanced Learning}}, {{ :dm:dm2_lab05_imbalance.zip |ImbLearn}} | Guidotti|  
-|11.| 24.03.2021 16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Neural Networks, Deep Neural Networks | {{ :dm:12_dm2_neural_network_2021.pdf | Neural Network}}, {{ :dm:09_neural_networks.ipynb.zip | NN_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2011-20210324_161454-Registrazione%20della%20riunione.mp4?web=1| 1st Part]],[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2011-20210324_170537-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]]  +|10.| 27.03.2024 11-13 |CDimensionality Reduction | {{ :dm:23_dm2_dimred_2023_24.pdf | Dimensionality Reduction}}, {{ :dm:dm2_lab06_dimred.zip |DimRed}} | Guidotti|
-|- | 25.03.2021 15:00-17:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Neural Networks Forward and Backpropagation Example, Case Study Music | {{ :dm:09_neural_network_implementation.ipynb.zip | NN_Implementation}}, {{ :dm:sanremoanalysis_unipi.pdf | Case Study}}| Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Office%20Hours-20210325_151702-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Office%20Hours-20210325_160016-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] | +
-|12.| 29.03.2021 14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Neural Networks (Training Tricks), Ensemble Classifiers | {{ :dm:13_dm2_ensemble_2021.pdf | Ensemble Classifiers}}  | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione%20in%20_Generale_-20210329_141249-Registrazione%20della%20riunione.mp4?web=1 | 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione%20in%20_Generale_-20210329_151828-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] | +
-|13.| 31.03.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] Ensemble Classifiers | {{ :dm:13_dm2_ensemble_2021.pdf | Ensemble Classifiers}}, {{ :dm:10_ensemble.ipynb.zip | Ensemble_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2013-20210331_160739-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2013-20210331_170034-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] | +
-|14.| 29.03.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Time Series Similarity | {{ :dm:14_dm2_time_series_similarity_2021.pdf | Time Series Similarity}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione%20in%20_Generale_-20210412_141848-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione%20in%20_Generale_-20210412_152505-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] | +
-|15.| 31.03.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] |  |  | Guidotti |[[TODO| 1st Part]], [[TODO| 2nd Part]] | +
 ====== Exams ====== ====== Exams ======
  
-===== Exam DM1 ======+** How and Where: ** 
 +The exam will take place in oral mode only at the teacher's office or classroom previously designated. 
 +The exam will be held online on the 420AA Data Mining course channel only at the request of the 
 +student in accordance with current legislation.
  
-  * ** RULES FOR EXAMS for DATA SCIENCE & BI and DIGITAL HUMANITIES - DM1(6CFU)**: {{ :dm:rules_dm1_6cfu_.pdf |EXAM RULES Summer Session - DM1(6CFU)}}+** When: ** 
 +The dates relating to the start of the three exams are/will be published on the online platform 
 +https://esami.unipi.it/. Within each session, we will identify dates and slots in order to distribute the 
 +various orals. The dates and slots to take the exam will be published on the course page by the end of 
 +May. Each student must also register on https://esami.unipi.it/. The examination can only be carried out after the delivery of the project. The project must be delivered one week before when you want to take the exam. Group oral discussions will be preferred in respect of the project groups in order to parallelize any discussion on the project. It is not mandatory to take the oral exam together with the other members of the group.  
 +In the event that the oral exam is not passed, it will not be possible to take it for 20 days. If the project is not considered sufficient, it must be carried out again on a new dataset or a very updated version of the current one.
  
-The exam is composed of two parts:+** What: **  
 +The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects. 
 +  - Understanding of the theoretical aspects of the topics addressed during the course. The student may be required to write on formulas or pseudocode. During the explanations, the student can use pen and paper. 
 +  - Understanding of the algorithms illustrated during the course and their practical implementation. You will be asked to perform one or more simple exercises. The text will be shown on the teacher's screen and / or copied to Miro. The student will have to use pen and paper (if online by Miro https://miro.com/ to show how the exercise is solved. 
 +  - Discussion of the project with questions from the teacher regarding unclear aspects, 
 +questionable steps or choices.
  
-  An **oral exam **that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises+** Final Mark: ** for 12-credit exam, the final mark will be obtained as the 
 +average mark of DM1 and DM2.
  
-  A **project** consists in exercises that require the use of data mining tools for analysis of data. Exercises includedata understanding, clustering analysis, frequent pattern mining, and classification (see the guidelines for more details). The project has to be performed by min 3, max 4 people. It has to be performed by using Knime, Python or a combination of them. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to [[datamining.unipi@gmail.com]]. Please, use “[DM1 2020-2021] Project” in the subject.  +===== Exam Booking Periods ===== 
-Tasks of the project+  Exam portal link: [[https://esami.unipi.it/|here]] 
-      ** Data Understanding: ** Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations. (see Guidelines for details) +  * 1st Appellofrom 09/01/2024 to 31/12/2024 
-      ** Clustering analysis: ** Explore the dataset using various clustering techniquesCarefully describe your's decisions for each algorithm and which are the advantages provided by the different approaches(see Guidelines for details) +  2nd Appello: from 01/02/2024 to 17/02/2024 
-      ** Classification** Explore the dataset using classification treesUse them to predict the target variable(see Guidelines for details) +  3rd Appello 
-      -  ** Association Rules: ** Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict target variable. (see Guidelines for details)+  4th Appello:  
 +  5th Appello:  
 +  6th Appello 
 +  
 +===== Exam Booking Agenda ===== 
 +  1st Appello - DM1: https://agende.unipi.it/yra-ief-dmo, DM2https://agende.unipi.it/rnm-urj-wsu 
 +  * 2nd Appello DM1: https://agende.unipi.it/yra-ief-dmo, DM2: https://agende.unipi.it/rnm-urj-wsu 
 +  3rd Appello:  
 +  4th Appello 
 +  5th Appello:  
 +  6th Appello: 
  
-  Project 1 +**Do not forget to make the evaluation of the course!!!** 
-      - Dataset: **IBM-HR** +===== Exam DM1 ======
-      - Assigned: 16/09/2020 +
-      - Midterm Deadline: 21/11/2020 (half project required, i.e., data understanding and at least two clustering algorithms) +
-      - Final Deadline: <del>07/01/2021</del> 14/01/2021(complete project required) +
-      - Data: {{ :dm:datasetproject1.zip | here}} +
-      - Description: [[https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset|IBM-HR]] +
-      - (please download the data from {{ :dm:datasetproject1.zip | here}} and not from the link with the description as we are using a different version of the data)+
  
-  * Project 2 +The exam is composed of two parts: 
-      - Dataset: **Bank Loan Status** + 
-      - Assigned15/01/2020 +  An **oral exam**, that includes(1) discussing the project report; (2) discussing topics presented during the classesincluding the theory and practical exercises
-      - Deadline: 4 days before the oral exam +
-      - This dataset must be used for all tasks. For the classification taskyou have to split the dataset into train and test set and the class to predict is the variable "Loan Status". +
-      - This dataset is valid for all the exam sessions until September. +
-      - Download the dataset {{:dm:credit_2020.zip|Bank Loan Status dataset}} (in CSV format, zipped)+
  
- **Guidelines for the project are [[:dm:start:guidelines|here]].**+  **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, pattern mining, and classification (guidelines will be provided for more details). The project has to be performed by min 2, max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to [[andrea.fedele@phd.unipi.it]] and [[riccardo.guidotti@unipi.it]]. Please, use “[DM1 2023-2024] Project” in the subject.
    
-===== Exam DM part II (DMA======+  * **Dataset** 
 +    - Assigned: 25/09/2023 
 +    - MidTerm Submission: 15/11/2023 (+0.5(half project required, i.e., Data Understanding & Preparation and Clustering) 
 +    - Final Submission: 31/12/2023 (+0.5) one week before the oral exam (complete project required). 
 +    - Dataset: {{ :dm:std.zip | STD}}
  
-The exam is composed of two parts:+** DM1 Project Guidelines ** 
 +See {{ :dm:dm1_project_guidelines_23_24.pdf | Project Guidelines}}.
  
-  * A **project**, that consists in employing the methods and algorithms presented during the classes for solving exercises on a given dataset. The project has to be realized by max 3 people. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 30 pages (suggested 25) of text including figures + 1 cover page (minimum font 11, minimum interline 1). The project must be delivered at least 7 days before the oral exam. The project must be delivered to [[riccardo.guidotti@unipi.it]] AND [[salvatore.citraro@phd.unipi.it]] with subject "[DM2 Project]" 
  
-  * An **oral exam**, that includes: (1) discussing topics presented during the classes, including the theory of the parts already covered by the written exam; (2) resolving simple exercises using the Miro platform; (3) discussing the project report with a group presentation;   
  
-  * **Dataset**: the data is about Music Analysis and can be downloaded here: [[https://github.com/mdeff/fma| github]] (or here [[https://archive.ics.uci.edu/ml/datasets/FMA%3A+A+Dataset+For+Music+Analysis|uci]]) 
-     * Data can be downloaded here [[https://os.unil.cloud.switch.ch/fma/fma_metadata.zip|fma_metadata.zip]] 
-     * Submission Draft 1: 19/04/2020 23:59 Italian Time (we expect Module 1 and Module 2) 
-     * Submission Draft 2: 08/05/2020 23:59 Italian Time 
-     * Final Submission: one week before the oral exam. 
  
-** Project Guidelines **+  
 +===== Exam DM2 ======
  
-  * **Module 1 - Introduction, Imbalanced Learning and Anomaly Detection** +The exam is composed of two parts:
-      - Explore and prepare the dataset. You are allowed to take inspiration from the associated GitHub repository and figure out your personal research perspective (from choosing a subset of variables to the class to predict…). You are welcome in creating new variables and performing all the pre-processing steps the dataset needs. +
-      - Define one or more (simple) classification tasks and solve it with Decision Tree and KNN. You decide the target variable. +
-      - Identify the top 1% outliersadopt at least three different methods from different families (e.g., density-based, angle-based... ) and compare the results. Deal with the outliers by removing them from the dataset or by treating the anomalous variables as missing values and employing replacement techniques. In this second case, you should check that the outliers are not outliers anymore. Justify your choices in every step. +
-      - Analyze the value distribution of the class to predict with respect to point 2; if it is unbalanced leave it as it is, otherwise turn the dataset into an imbalanced version (e.g., 96% - 4%, for binary classification). Then solve the classification task using the Decision Tree or the KNN by adopting various techniques of imbalanced learning. +
-      - Draw your conclusions about the techniques adopted in this analysis.+
  
-  * **Module 2 - Advanced Classification Methods** +  * An **oral exam**, that includes: (1) discussing the project report; (2discussing topics presented during the classesincluding the theory and practical exercises
-      - Solve the classification task defined in Module (or define new oneswith the other classification methods analyzed during the course: Naive Bayes Classifier, Logistic Regression, Rule-based Classifiers, Support Vector Machines, Neural Networks, Ensemble Methods and evaluate each classifier with the techniques presented in Module 1 (accuracy, precision, recall, F1-score, ROC curve). Perform hyper-parameter tuning phases and justify your choices. +
-      - Besides the numerical evaluation draw your conclusions about the various classifierse.g. for Neural Networks: what are the parameter sets or the convergence criteria which avoid overfitting? For Ensemble classifiers how the number of base models impacts the classification performance? For any classifier which is the minimum amount of data required to guarantee an acceptable level of performance? Is this level the same for any classifier? What is revealing the feature importance of Random Forests? +
-      - Select two continuous attributes, define a regression problem and try to solve it using different techniques reporting various evaluation measures. Plot the two-dimensional dataset. Then generalize to multiple linear regression and observe how the performance varies.+
  
-N.B. When "solving the classification task"remember, (i) to testwhen needed, different criteria for the parameter estimation of the algorithms, and (iito evaluate the classifiers (e.g., Accuracy, F1, Lift Chart) in order to compare the results obtained with an imbalanced technique against those obtained from using the "original" dataset+  * A **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: imbalanced learningdimensionality reductionoutlier detection, advanced classification/regression methods, time series analysis/clustering/classification (guidelines will be provided for more details). The project has to be performed by min 1max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 30 pages of text including figures. The paper must be emailed to [[andrea.fedele@phd.unipi.it]] and [[riccardo.guidotti@unipi.it]]. Pleaseuse “[DM2 2023-2024] Project” in the subject. 
 +  
 +  * **Dataset** 
 +    - Assigned: 19/02/2024 
 +    - MidTerm Submission: 30/04/2024 (Modules 1 and (for TS classification non DL-based models) 
 +    - Final Submission: one week before the oral exam (complete project required, also with DL-based models for TS classification). 
 +    - Dataset: [[https://unipiit-my.sharepoint.com/:u:/g/personal/a_fedele7_studenti_unipi_it/EUSyNv8ahD9FrBZ6fiF3gvABcYVLpbo1biIyOGy8AmcO5g?e=ziQtEc|STD]] 
 + 
 +** DM2 Project Guidelines ** 
 +See {{ :dm:dm2_project_guidelines_23_24.pdf | Project Guidelines}}.
  
  
  
-====== Exam Dates ====== 
  
-===== Exam Sessions ===== 
-^ Session ^ Date            ^ Time        ^ Room   ^ Notes ^ Marks ^ 
-|1.|16.01.2019| 14:00 - 18:00| [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Please, use the system for registration: https://esami.unipi.it/ | | 
  
 ===== Past Exams ===== ===== Past Exams =====
Linea 305: Linea 323:
   * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}}   * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}}
   * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]]   * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]]
- 
   * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]]   * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]]
  
 ====== Previous years ===== ====== Previous years =====
-   * [[dm.2019-20]] +  * [[dm.2022-23ds]] 
-   * [[dm.2018-19]] +  * [[dm.2021-22ds]] 
-    [[dm.2017-18]]+  * [[dm.2020-21]] 
 +  * [[dm.2019-20]] 
 +  * [[dm.2018-19]] 
 +  * [[dm.2017-18]]
   * [[dm.2016-17]]   * [[dm.2016-17]]
   * [[dm.2015-16]]   * [[dm.2015-16]]
Linea 318: Linea 338:
   * [[dm.2012-13]]   * [[dm.2012-13]]
   * [[dm.2011-12]]   * [[dm.2011-12]]
-  * [[dm.2010-11]] 
-  * [[dm.2009-10]] 
-  * [[dm.2008-09]] 
-  * [[dm.2007-08]] 
-  * [[dm.2006-07]] 
-  * [[PhDWorkshop2011]] 
-  * [[SNA.Ingegneria2011]] 
-  * [[SNA.IMT.2011]] 
-  * [[MAINS.SANTANNA.2011-12]] 
-  * [[MAINS.SANTANNA.DM4CRM.2012]] 
-  * [[MAINS.SANTANNA.DM4CRM.2016]] 
-  * [[MAINS.SANTANNA.DM4CRM.2017 | Data Mining for Customer Relationship Management 2017]] 
-  * [[MAINS.SANTANNA.DM4CRM.2018]] 
-  * [[MAINS.SANTANNA.DM4CRM.2019]] 
-  * [[SDM2018 | Instructions for camera ready and copyright transfer]] 
-  * [[DM-SAM | Storie dell'Altro Mondo]] 
-  * [[DM-I40 | Master Industry 4.0]] 
  
dm/start.1618234842.txt.gz · Ultima modifica: 12/04/2021 alle 13:40 (3 anni fa) da Riccardo Guidotti