Indice

Data Mining A.A. 2023/24

DM1 - Data Mining: Foundations (6 CFU)

Instructors:

Teaching Assistant

DM2 - Data Mining: Advanced Topics and Applications (6 CFU)

Instructors:

Teaching Assistant

News

Learning Goals

Hours and Rooms

DM1

Classes

Day of Week Hour Room
Monday 11:00 - 13:00 C1
Wednesday 11:00 - 13:00 C1

Office hours - Ricevimento:

DM 2

Classes

Day of Week Hour Room
Monday 09:00 - 11:00 C
Wednesday 11:00 - 13:00 C

Office Hours - Ricevimento:

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

Slides

Software

Other softwares for Data Mining

Class Calendar (2023/2024)

First Semester (DM1 - Data Mining: Foundations)

Day Time Room Topic Material Lecturer
01. 18.09.2023 11-13 C1 Overview, Introduction Intro Pedreschi
20.09.2023 11-13 No Lecture
02. 25.09.2023 11-13 C1 Lab. Introduction to Python Python Basic Guidotti
03. 27.09.2023 11-13 C1 Lab. Data Understanding Data Understanding Guidotti
04. 02.10.2023 11-13 C1 Data Understanding Data Understanding Guidotti
05. 04.10.2023 11-13 C1 Data Understanding & Preparation Data Understanding, Data Preparation Pedreschi
06. 09.10.2023 11-13 C1 Data Preparation & Data Similarity Data Preparation, Data Similarity Pedreschi
07. 11.10.2023 11-13 C1 Data Similarity & Lab. Data Understanding Data Similarity, Data Understanding Pedreschi
08. 16.10.2023 11-13 C1 Introduction to Clustering, K-Means Intro_Clustering, K-Means Pedreschi
09. 18.10.2023 11-13 C1 Clustering Validation, Hierarchical Clustering Intro_Clustering, Hierarchical Pedreschi
10. 23.10.2023 11-13 C1 Density-based Clustering Density-based Clustering Pedreschi
11. 25.10.2023 11-13 C1 Lab. Clustering Clustering Guidotti
12. 30.10.2023 11-13 C1 Ex. Clustering ExClustering Guidotti
01.11.2023 11-13 No Lecture
13. 06.11.2023 11-13 C1 Intro Classification, kNN(video) Intro_Classification, kNN Guidotti
14. 08.11.2023 11-13 C1 Naive Bayes, Exercises Naive Bayes Guidotti
15. 13.11.2023 11-13 C1 Model Evaluation Model Evaluation Guidotti
16. 15.11.2023 11-13 C1 Model Evaluation Exercises & Lab Classification Guidotti
20.11.2023 11-13 No Lecture
17. 22.11.2023 11-13 C1 Decision Tree Classifier Decision Tree Pedreschi
18. 27.11.2023 11-13 C1 Decision Tree Classifier Decision Tree Pedreschi
19. 29.11.2023 11-13 C1 Exercises and Lab. Decision Tree Classifier Decision Tree Guidotti
20. 04.12.2023 11-13 C1 Decision Tree Classifier, Exercises and Lab Decision Tree Pedreschi
21. 06.12.2023 11-13 C1 Intro Regression & Lab. Regression Regression, Regression Guidotti
22. 11.12.2023 11-13 C1 Into Pattern Mining and Apriori Pattern Mining Pedreschi
23. 13.12.2023 16-18 C1 Apriori & Lab. Pattern Mining Pattern Mining, Pattern Mining Pedreschi
24. 18.12.2023 11-13 C FP-Growth and Exercises Pattern Mining Guidotti

Second Semester (DM2 - Data Mining: Advanced Topics and Applications)

Day Time Room Topic Material Lecturer
01. 19.02.2024 14-16 C Overview, Rule-based Models Introduction, Guidelines, Rule-based Models Guidotti
21.02.2024 No Lecture
26.02.2024 No Lecture
02. 28.02.2024 11-13 C Sequential Pattern Mining Sequential Pattern Mining, GSP Guidotti
03. 04.03.2024 9-11 C Sequential Pattern Mining Sequential Pattern Mining, GSP Guidotti
04. 06.03.2024 11-13 C Transactional Clustering Transactional Clustering Guidotti
05. 11.03.2024 9-11 C Time Series Similarity Time Series Similarity, TS_Load, TS_Similarity Guidotti
06. 13.03.2024 11-13 C Time Series Approximation Time Series Clustering, TS_Approx_Clustering Guidotti
07. 18.03.2024 9-11 C Time Series Clustering & Motifs Time Series Motifs, TS_Motifs Guidotti
08. 20.03.2024 11-13 C Time Series Classification Time Series Classification, TS_Classification Guidotti
09. 25.03.2024 9-11 C Imbalanced Learning Imbalanced Learning, ImbLearn Guidotti
10. 27.03.2024 11-13 C Dimensionality Reduction Dimensionality Reduction, DimRed Guidotti
11. 03.04.2024 11-13 C Outlier Detection Outlier Detection Guidotti
12. 08.04.2024 9-11 C Outlier Detection Outlier Detection, OutlierDetection Guidotti
13. 10.04.2024 11-13 C Outlier Detection Outlier Detection, OutlierDetection Guidotti
14. 15.04.2024 14-16 C Gradient Descend, MLE GD, MLE Guidotti
15. 17.04.2024 11-13 C Odds, LogOdds, Logistic Regression Odds, LogReg, LogReg Guidotti
16. 22.04.2024 9-11 C Support Vector Machine SVM, SVM Guidotti
17. 24.04.2024 11-13 C Perceptron, Neural Networks Perceptron Guidotti
18. 29.04.2024 9-11 C Deep Neural Networks Deep Neural Networks, NN Guidotti
19. 06.05.2024 9-11 C CNN, RNN, DL-TS, Ensemble Intro DNN, TSC-DNN, Ensemble Guidotti
20. 08.05.2024 11-13 C Ensemble, Boosting, Adaboost Ensemble, LabEnsemble Guidotti
21. 13.05.2024 9-11 C Ensemble-TS, Gradient Boosting Gradient Boosting Machines, LabEnsemble Guidotti
22. 15.05.2024 11-13 C Extreme Gradient Boosting Gradient Boosting Machines, LabEnsemble Guidotti
23. 20.05.2024 9-11 C1 eXplainable Artificial Intelligence XAI, LabXAI Guidotti
24. 22.05.2024 11-13 C1 eXplainable Artificial Intelligence XAI, LabXAI Guidotti

Exams

How and Where: The exam will take place in oral mode only at the teacher's office or classroom previously designated. The exam will be held online on the 420AA Data Mining course channel only at the request of the student in accordance with current legislation.

When: The dates relating to the start of the three exams are/will be published on the online platform https://esami.unipi.it/. Within each session, we will identify dates and slots in order to distribute the various orals. The dates and slots to take the exam will be published on the course page by the end of May. Each student must also register on https://esami.unipi.it/. The examination can only be carried out after the delivery of the project. The project must be delivered one week before when you want to take the exam. Group oral discussions will be preferred in respect of the project groups in order to parallelize any discussion on the project. It is not mandatory to take the oral exam together with the other members of the group. In the event that the oral exam is not passed, it will not be possible to take it for 20 days. If the project is not considered sufficient, it must be carried out again on a new dataset or a very updated version of the current one.

What: The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects.

  1. Understanding of the theoretical aspects of the topics addressed during the course. The student may be required to write on formulas or pseudocode. During the explanations, the student can use pen and paper.
  2. Understanding of the algorithms illustrated during the course and their practical implementation. You will be asked to perform one or more simple exercises. The text will be shown on the teacher's screen and / or copied to Miro. The student will have to use pen and paper (if online by Miro https://miro.com/ to show how the exercise is solved.
  3. Discussion of the project with questions from the teacher regarding unclear aspects,

questionable steps or choices.

Final Mark: for 12-credit exam, the final mark will be obtained as the average mark of DM1 and DM2.

Exam Booking Periods

Exam Booking Agenda

When registering for the oral exam please specify in the notes DM1 if you do not want to do DM2 (that is assumed by default). After having booked for DM1 please contact Prof. Pedreschi to agree on the exam date (put Prof. Guidotti and Andrea Fedele in cc). There will be no agenda for DM1.

Do not forget to make the evaluation of the course!!!

Exam DM1

The exam is composed of two parts:

DM1 Project Guidelines See Project Guidelines.

Exam DM2

The exam is composed of two parts:

DM2 Project Guidelines See Project Guidelines.

Past Exams

Reading About the "Data Scientist" Job

… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.

Previous years