Strumenti Utente

Strumenti Sito


dm:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
dm:start [05/12/2022 alle 10:59 (22 mesi fa)] – [First Semester (DM1 - Data Mining: Foundations)] Riccardo Guidottidm:start [16/09/2024 alle 06:41 (4 giorni fa)] (versione attuale) Riccardo Guidotti
Linea 1: Linea 1:
-<html> +====== Data Mining A.A. 2024/25 ======
-<!-- Google Analytics --> +
-<script type="text/javascript" charset="utf-8"> +
-(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +
-(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), +
-m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) +
-})(window,document,'script','//www.google-analytics.com/analytics.js','ga'); +
- +
-ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); +
-ga('personalTracker.require', 'linker'); +
-ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it', 'luciacpassaro.github.io'] );     +
-ga('personalTracker.require', 'displayfeatures'); +
-ga('personalTracker.send', 'pageview', 'courses/dm/'); +
-setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000);  +
-</script> +
-<!-- End Google Analytics --> +
-<!-- Global site tag (gtag.js) - Google Analytics --> +
-<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script> +
-<script> +
-  window.dataLayer = window.dataLayer || []; +
-  function gtag(){dataLayer.push(arguments);+
-  gtag('js', new Date()); +
- +
-  gtag('config', 'G-LPWY0VLB5W'); +
-</script> +
-<!-- Capture clicks --> +
-<script> +
-jQuery(document).ready(function(){ +
-  jQuery('a[href$=".pdf"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'PDFs', fname); +
-  }); +
-  jQuery('a[href$=".r"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Rs', fname); +
-  }); +
-  jQuery('a[href$=".zip"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'ZIPs', fname); +
-  }); +
-  jQuery('a[href$=".mp4"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-  jQuery('a[href$=".flv"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-}); +
-</script> +
-</html> +
-====== Data Mining A.A. 2022/23 ======+
  
 ===== DM1 - Data Mining: Foundations (6 CFU) ===== ===== DM1 - Data Mining: Foundations (6 CFU) =====
Linea 66: Linea 15:
  
 Teaching Assistant Teaching Assistant
-  * **Francesco Spinnato** +  * **Andrea Fedele** 
-    * KDDLab, Scuola Normale Superiore +    * KDDLab, Università di Pisa 
-    * [[https://kdd.isti.cnr.it/people/spinnato-francesco]] +    * [[https://www.linkedin.com/in/andrea-fedele/?originalSubdomain=it]] 
-    * [[francesco.spinnato@sns.it]]  +    * [[andrea.fedele@phd.unipi.it]]  
 ===== DM2 - Data Mining: Advanced Topics and Applications (6 CFU) ===== ===== DM2 - Data Mining: Advanced Topics and Applications (6 CFU) =====
  
Linea 79: Linea 28:
  
 Teaching Assistant Teaching Assistant
-  * **Francesco Spinnato** +  * **Andrea Fedele** 
-    * KDDLab, Scuola Normale Superiore +    * KDDLab, Università di Pisa 
-    * [[https://kdd.isti.cnr.it/people/spinnato-francesco]] +    * [[https://www.linkedin.com/in/andrea-fedele/?originalSubdomain=it]] 
-    * [[francesco.spinnato@sns.it]]  +    * [[andrea.fedele@phd.unipi.it]]   
 +    * Meeting: https://calendly.com/andreafedele/
 ====== News ====== ====== News ======
-     * [15.09.2022Project Groups [[https://docs.google.com/spreadsheets/d/1j5A6JPurO6o3ycjb4qc1lKZ4K2HqpdQhb_eyII_37dc/edit?usp=sharing|link]] +     * **[02.09.2024]** Lectures will start on Monday 30 September 2024 at 11.00 room C1
-     [15.09.2022] MS Teams [[https://teams.microsoft.com/l/team/19%3a-E-BCEQRJk-qyKrkyNoos6n4h6neLOfJM4zI5GxY9Us1%40thread.tacv2/conversations?groupId=dfb4c6f2-9430-4eda-8bb4-69bdebd5e01b&tenantId=c7456b31-a220-47f5-be52-473828670aa1|link]]  +     * [02.09.2024] Lectures will be in presence only. Registrations of the lectures of past years can be found at the bottom of this web page. 
-     * [15.09.2022] Lectures will be in presence only. Registrations of the lectures of past years can be found at the bottom of this web page. +     * [02.09.2024Project Groups [[TODO|link]] 
-     * **[23.11.2022]** In order to recover from skipped and suspended lectures we signal the presence of two new dates in unusual slots for our lectures, i.e., Wed 7th Dec 14.00-16.00 Room A1 and Wed 14th Dec 14.00-16.00 Room A1.+     [11.09.2023] MS Teams [[TODO|link]] 
 ====== Learning Goals ====== ====== Learning Goals ======
   * DM1   * DM1
Linea 114: Linea 64:
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Monday  |  11:00 - 13:00  |  Aula A1   |  +|  Monday  |  11:00 - 13:00  |  C1   |  
-|  Thursday  |  11:00 - 13:00  |  Aula A1  +|  Tuesday  |  14:00 - 16:00  |  C1  
  
 **Office hours - Ricevimento:** **Office hours - Ricevimento:**
  
   * Prof. Pedreschi   * Prof. Pedreschi
-      * Monday 16:00 - 18:00+      * TBD
       * Online       * Online
   * Prof. Guidotti   * Prof. Guidotti
-      * Wednesday 15-17 or Appointment by email+      * Wednesday 16:00 18:00 or Appointment by email
       * Room 363 Dept. of Computer Science or MS Teams       * Room 363 Dept. of Computer Science or MS Teams
  
Linea 133: Linea 83:
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  ???  |  11:00 - 13:00  |  ???  |  +|  Monday   |  09:00 - 11:00  |  C   |  
-|  ???   11:00 - 13:00  |  ???  |  +|  Wednesday   11:00 - 13:00  |   |  
  
 **Office Hours - Ricevimento:** **Office Hours - Ricevimento:**
  
-  * Wednesday 15-17 or Appointment by email+  * Tuesday 15.00-17.00 or Appointment by email
   * Room 363 Dept. of Computer Science or MS Teams   * Room 363 Dept. of Computer Science or MS Teams
  
Linea 168: Linea 118:
   * [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]]   * [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]]
   * [[http://www.cs.waikato.ac.nz/ml/weka/ | WEKA ]] Data Mining Software in JAVA. University of Waikato, New Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ | Download page ]]   * [[http://www.cs.waikato.ac.nz/ml/weka/ | WEKA ]] Data Mining Software in JAVA. University of Waikato, New Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ | Download page ]]
-  * Didactic Data Mining [[http://matlaspisa.isti.cnr.it:5055/DDM]]+  * Didactic Data Mining [[http://matlaspisa.isti.cnr.it:5055/HelpDDMv1]], [[https://kdd.isti.cnr.it/ddm/#/| DDMv2]] 
    
-====== Class Calendar (2021/2022) ======+====== Class Calendar (2024/2025) ======
  
 ===== First Semester (DM1 - Data Mining: Foundations) ===== ===== First Semester (DM1 - Data Mining: Foundations) =====
  
-^ ^ Day ^ Time ^ Room ^ Topic ^ Learning Material ^ Lecturer ^ +^ ^ Day ^ Time ^ Room ^ Topic ^ Material ^ Lecturer ^ 
-|01.| 15.09.2022 | 11-13 |A1| Overview, Intro, KDD and CRIPS. | {{ :dm:00_dm1_introduction_2022_23.pdf | Intro}} | Pedreschi/Guidotti | +|   16.09.2023 | |  | No Lecture |  |  | 
-|   19.09.2022 11-13 |  | No Lecture |  |  +|   17.09.2023 | |  | No Lecture |  |  | 
-|02.| 22.09.2022 | 11-13 |A1| Project Guideliens & Intro to Python | {{ :dm:dm1_project_guidelines_22_23.pdf | Project Guidelines}}, {{ :dm:dm1_lab01_python_basics.zip | Intro Python}} | Spinnato +  23.09.2023 | |  | No Lecture |  |  | 
-|   26.09.2022 11-13 |  | No Lecture |  |  | +  24.09.2023 | |  | No Lecture |  |  | 
-|03.29.09.2022 11-13 |A1| Data Understanding | {{ :dm:01_dm1_data_understanding_2022_23.pdf | Data Understanding}}  | Pedreschi | +|01.| 30.09.2024 | 11-13 |C1OverviewIntroduction | {{ :dm:00_dm1_introduction_2023_24.pdf | Intro}} | Pedreschi| 
-|04.| 03.10.2022 | 11-13 |A1| Data Understanding & Data Preparation  | {{ :dm:02_dm1_data_preparation_2022_23.pdf | Data Preparation}}| Pedreschi | +
-|05.| 06.10.2022 | 11-13 |A1| Lab. Data Understanding | {{ :dm:data_understanding.zip | Data Und Python}} | Spinnato/Guidotti | +
-|   | 10.10.2022 | 11-13 |  | No Lecture |  |  | +
-|06.13.10.2022 | 11-13 |A1| Data Preparation, Similarity | {{ :dm:03_dm1_data_similarity_2022_23.pdf | Data Similarity}}, {{ :dm:data_understanding.zip | Data Und Python}} | Pedreschi | +
-|07.| 17.10.2022 | 11-13 |A1| Intro Clustering, K-Means | {{ :dm:04_dm1_clustering_intro_2022_23.pdf | Intro Clustering}}, {{ :dm:05_dm1_kmeans_2022_23.pdf | K-Means}} | Pedreschi | +
-|08.| 20.10.2022 | 11-13 |A1| K-Means | {{ :dm:05_dm1_kmeans_2022_23.pdf | K-Means}} | Pedreschi | +
-|09.| 24.10.2022 | 11-13 |A1| Hierarchical & Density-based | {{ :dm:06_dm1_hierarchical_clustering_2022_23.pdf | Hierarchical}}, {{ :dm:07_dm1_density_based_2022_23.pdf | Density}} | Pedreschi | +
-|10.| 27.10.2022 | 11-13 |A1| Lab. Clustering | {{ :dm:clustering.zip | Clustering Python}} | Spinnato/Guidotti | +
-|   | 30.10.2022 | 11-13 |  | No Lecture |  |  | +
-|11.| 03.11.2022 | 11-13 |A1Exercises Clustering | {{ :dm:ex1_dm1_clustering_2022_23.pdf | Exercises Clustering}} | Guidotti | +
-|12.| 07.11.2022 | 11-13 |A1| Intro Classification | {{ :dm:08_dm1_classification_intro_2022_23.pdf | Intro Classification}}, {{ :dm:09_dm1_knn_2022_23.pdf | kNN}} | Guidotti | +
-|13.| 10.11.2022 | 11-13 |A1| Eval MeasuresExercises kNN | {{ :dm:08_dm1_classification_intro_2022_23.pdf | Intro Classification}}, {{ :dm:09_dm1_knn_2022_23.pdf | kNN}} | Guidotti | +
-|14.| 14.11.2022 | 11-13 |A1| Decision Tree | {{ :dm:10_dm1_decision_trees_2022_23.pdf | Decision Trees}} | Guidotti | +
-|15.| 17.11.2022 | 11-13 |A1| Decision Tree, Exercises DT | {{ :dm:10_dm1_decision_trees_2022_23.pdf | Decision Trees}}, {{ :dm:tree_exercise.xlsx | Ex DT}} | Guidotti | +
-|16.| 22.11.2022 | 11-13 |A1| Decision Tree | {{ :dm:10_dm1_decision_trees_2022_23.pdf | Decision Trees}} | Guidotti | +
-|17.| 24.11.2022 | 11-13 |A1| Naive Bayes Classifier | {{ :dm:11_dm1_naive_bayes_2022_23.pdf | NBC}} | Guidotti | +
-|18.| 28.11.2022 | 11-13 |A1| Lab. Classification | {{ :dm:classifcazion.zip | Classification Python}} | Spinnato/Guidotti | +
-|19.| 01.12.2022 | 11-13 |A1| Intro Regression | {{ :dm:12_dm1_linear_regression_2022_23.pdf | Intro Regression}} | Guidotti | +
-|20.| 05.12.2022 | 11-13 |A1| Pattern Mining | {{ :dm:13_dm1_pattern_mining_2022_23.pdf | Pattern Mining}} | Pedreschi | +
-|21.| 07.12.2022 | 14-16 |A1| Pattern Mining |  | Pedreschi | +
-|   | 08.12.2022 | 11-13 |  | No Lecture |  |  | +
-|22.| 12.12.2022 | 11-13 |A1| TBD |  | Guidotti | +
-|23.| 14.12.2022 | 14-16 |A1| TBD |  | Guidotti | +
-|24.| 15.12.2022 | 11-13 |A1| Lab. Pattern Mining |  | Spinnato/Guidotti |+
 ===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) ===== ===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) =====
  
-^ ^ Day ^ Room  ^ Topic ^ Learning Material ^ Instructor +^ ^ Day ^ Time ^ Room ^ Topic ^ Material ^ Lecturer 
-| 01.| 14.02.2022 11:00--13:00  C    | Guidotti | +|01.| 19.02.2024 | 14-16 |C| Overview, Rule-based Models | {{ :dm:14_dm2_intro_2023_24.pdf Introduction}}, {{ :dm:dm2_project_guidelines_23_24.pdf Guidelines}}, {{ :dm:15_dm2_rule_based_classifier_2023_24.pdf Rule-based Models }} | Guidotti| 
 ====== Exams ====== ====== Exams ======
  
Linea 224: Linea 152:
 ** What: **  ** What: ** 
 The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects. The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects.
-  - Understanding of the theoretical aspects of the topics addressed during the course. The student may be required to write on formulas or pseudocode. During the explanations, the student can use pen and paper (if online, the student can use the Miro graphic system https://miro.com/ during the explanations)+  - Understanding of the theoretical aspects of the topics addressed during the course. The student may be required to write on formulas or pseudocode. During the explanations, the student can use pen and paper.
   - Understanding of the algorithms illustrated during the course and their practical implementation. You will be asked to perform one or more simple exercises. The text will be shown on the teacher's screen and / or copied to Miro. The student will have to use pen and paper (if online by Miro https://miro.com/ to show how the exercise is solved.   - Understanding of the algorithms illustrated during the course and their practical implementation. You will be asked to perform one or more simple exercises. The text will be shown on the teacher's screen and / or copied to Miro. The student will have to use pen and paper (if online by Miro https://miro.com/ to show how the exercise is solved.
   - Discussion of the project with questions from the teacher regarding unclear aspects,   - Discussion of the project with questions from the teacher regarding unclear aspects,
Linea 232: Linea 160:
 average mark of DM1 and DM2. average mark of DM1 and DM2.
  
-**Exam Booking Periods**+===== Exam Booking Periods =====
   * Exam portal link: [[https://esami.unipi.it/|here]]   * Exam portal link: [[https://esami.unipi.it/|here]]
-  * 1st Appello: 11/12/2022 00:00 - 05/01/2023 23:59 +  * 1st Appello: from TBD to TBD 
-  * 2nd Appello: 01/01/2023 00:00 - 26/01/2023 23:59+  * 2nd Appellofrom TBD to TBD 
 +  * 3rd Appellofrom TBD to TBD 
 +  * 4th Appello: from TBD to TBD 
 +  * 5th Appellofrom TBD to TBD 
 +  * 6th Appellofrom TBD to TBD
    
-**Exam Booking Agenda** +===== Exam Booking Agenda ===== 
-  * Agenda Link[[https://agende.unipi.it/nfj-juo-qms|here]] +When registering for the oral exam please specify in the notes DM1 if you do not want to do DM2 (that is assumed by default). After having booked for DM1 please contact Prof. Pedreschi to agree on the exam date (put Prof. Guidotti and Andrea Fedele in cc). There will be no agenda for DM1. 
-  * 1st Appello: starts 10/01/2023 + 
-  * 2nd Appello: starts 31/01/2023+  1st Appello - DM1 & DM2: from TBD to TBD (deliver project by TBD)  
 +  2nd Appello - DM1 & DM2: from TBD to TBD (deliver project by TBD)  
 +  * 3rd Appello- DM1 & DM2: from TBD to TBD (deliver project by TBD)  
 +  * 4th Appello: - DM1 & DM2: from TBD to TBD (deliver project by TBD)   
 +  * 5th Appello: - DM1 & DM2: from TBD to TBD (deliver project by TBD)  
 +  * 6th Appello: - DM1 & DM2: from TBD to TBD (deliver project by TBD)  
 + 
 + 
 +**Do not forget to make the evaluation of the course!!!**
 ===== Exam DM1 ====== ===== Exam DM1 ======
  
Linea 247: Linea 187:
   * An **oral exam**, that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises.    * An **oral exam**, that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises. 
  
-  * A **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, pattern mining, and classification (guidelines will be provided for more details). The project has to be performed by min 2, max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to [[datamining.unipi@gmail.com]]. Please, use “[DM1 2022-2023] Project” in the subject.+  * A **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, pattern mining, and classification (guidelines will be provided for more details). The project has to be performed by min 2, max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to [[andrea.fedele@phd.unipi.it]] and [[riccardo.guidotti@unipi.it]]. Please, use “[DM1 2023-2024] Project” in the subject.
    
   * **Dataset**   * **Dataset**
-    - Assigned: 15/09/2021 +    - Assigned: 30/09/2024 
-    - MidTerm Submission: **28/11/2022 (extended)** (half project required, i.e., Data Understanding & Preparation and Clustering) +    - MidTerm Submission: 15/11/2024 (+0.5) (half project required, i.e., Data Understanding & Preparation and Clustering) 
-    - Final Submission: **31/12/2022** or one week before the oral exam (complete project required). +    - Final Submission: 31/12/2024 (+0.5) one week before the oral exam (complete project required). 
-    - Dataset: {{:dm:ravdess_dm1_2223.zip | RAVDESS}} +    - Dataset: TBD
-    - Link original pages: [[https://zenodo.org/record/1188976#.YyLSI-xBz0o| zenodo]], [[https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio| kaggle1]], [[https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-song-audio| kaggle2]]+
  
 ** DM1 Project Guidelines ** ** DM1 Project Guidelines **
-See {{ :dm:dm1_project_guidelines_22_23.pdf | Project Guidelines}}.+See {{ :dm:dm1_project_guidelines_23_24.pdf | Project Guidelines}}.
  
  
Linea 265: Linea 204:
 ===== Exam DM2 ====== ===== Exam DM2 ======
  
-TBD +The exam is composed of two parts: 
 + 
 +  * An **oral exam**, that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises.  
 + 
 +  * A **project**, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: imbalanced learning, dimensionality reduction, outlier detection, advanced classification/regression methods, time series analysis/clustering/classification (guidelines will be provided for more details). The project has to be performed by min 1, max 3 people. It has to be performed by using Python or any other data mining software. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 30 pages of text including figures. The paper must be emailed to [[andrea.fedele@phd.unipi.it]] and [[riccardo.guidotti@unipi.it]]. Please, use “[DM2 2023-2024] Project” in the subject. 
 +  
 +  * **Dataset** 
 +    - Assigned: 19/02/2024 
 +    - MidTerm Submission: 07/05/2024 (Modules 1 and 2 (for TS classification non DL-based models)) 
 +    - Final Submission: one week before the oral exam (complete project required, also with DL-based models for TS classification). 
 +    - Dataset: TBD 
 + 
 +** DM2 Project Guidelines ** 
 +See {{ :dm:dm2_project_guidelines_23_24.pdf | Project Guidelines}}. 
  
  
  
-====== Exam Dates ====== 
  
-===== Exam Sessions ===== 
-^ Session ^ Date  ^ Room   ^ Notes ^ Marks ^ 
-|1.|10.01.2023| | Please, use the system for registration: https://esami.unipi.it/ | | 
-|2.|31.01.2023| | Please, use the system for registration: https://esami.unipi.it/ | | 
-|3.|??.??.2023| | Please, use the system for registration: https://esami.unipi.it/ | | 
-|4.|??.??.2023| | Please, use the system for registration: https://esami.unipi.it/ | | 
-|5.|??.??.2023| | Please, use the system for registration: https://esami.unipi.it/ | | 
-|6.|??.??.2023| | Please, use the system for registration: https://esami.unipi.it/ | | 
 ===== Past Exams ===== ===== Past Exams =====
   * Past exams texts can be found in old pages of the course. Please do not consider these exercises as a unique way of testing your knowledge. Exercises can be changed and updated every year and will be published together with the slides of the lectures.   * Past exams texts can be found in old pages of the course. Please do not consider these exercises as a unique way of testing your knowledge. Exercises can be changed and updated every year and will be published together with the slides of the lectures.
Linea 298: Linea 242:
  
 ====== Previous years ===== ====== Previous years =====
 +  * [[dm_ds2023-24]]
 +  * [[dm.2022-23ds]]
   * [[dm.2021-22ds]]   * [[dm.2021-22ds]]
   * [[dm.2020-21]]   * [[dm.2020-21]]
dm/start.1670237966.txt.gz · Ultima modifica: 05/12/2022 alle 10:59 (22 mesi fa) da Riccardo Guidotti

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki