Indice

Data Mining (309AA) - 9 CFU A.Y. 2020/2021

Instructor:

Teaching Assistant:

News

Learning Goals

Hours and Rooms

Classes

Day of Week Hour Room
Wednesday 09:00 - 10:45 Online
Thursday 09:00 - 10:45 Online
Friday 11:00 - 12:45 Online

Office hours - Ricevimento: Anna Monreale: Wednesday: 11:00-13:00 online using Teams (Appointment by email) Francesca Naretto: Monday: 15:00-18:00 online using Teams (Appointment by email)

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

Slides

Software

Class Calendar (2020/2021)

First Semester

Day Topic Learning material References
1. 16.09 09:00-10:45 Overview. Introduction to KDD 1-overview.pdf 1-intro-dm.pdf Chap. 1 Kumar Book
2. 17.09 09:00-10:45 Data Understanding Slides DU Chap.2 Kumar Book and additioanl resource of Kumar Book:Exploring Data If you have the first ed. of KUMAR this is the Chap 3
3. 18.09 09:00-10:45 Data Preparation 3-data_preparation.pdf Chap. 2 Kumar Book
4. 23.09 09:00-10:45 Data Preparation: Transformations & PCA 3-data_preparation.pdf Chap. 2 Kumar Book, Appendix B Dimensionality Reduction (only PCA)
5. 24.09 09:00-10:45 Data Similarities. Introduction to Clustering.4-data_similarity.pdf 5-basic_cluster_analysis-intro.pdf Data Similarity is in Chap. 2 while Clustering is in Chap. 7
6. 25.09 11:00-12:45 LAB: Data Understanding in Python Very basic notions on Python Notebook on Data Understanding tipsdata.zip
7. 30.09 09:00-10:45 Center-based clustering: kmeans 6-basic_cluster_analysis-kmeans-variants.pdf Chap. 7 Kumar Book
8. 01.10 09:00-10:45 Center-based clustering: Bisecting K-means, Xmeans, EM Same Slides of the previous lectures Chap. 7 Kumar Book, Clustering & Mixture Models xmeans.pdf
9. 02.10 11:00-12:45 Hierarchical clustering 7.basic_cluster_analysis-hierarchical.pdf ex._hierarchical-clustering.pdf Chap. 7 Kumar Book
10. 07.10 09:00-10:45 Density based clustering8.basic_cluster_analysis-dbscan-validity.pdf Chap. 7 Kumar Book
11. 08.10 09:00-10:45 Lab: clustering + Project Assignment py-clustering.zip
09.10 11:00-12:45 Lecture canceled
12. 14.10 09:00-10:45 Classification Problem + Decision trees 9.chap3_basic_classification-2020.pdf Chap. 3 Kumar Book
13. 15.10 09:00-10:45 Only 30 minutes of Discussion on the project due to connection problems Chap. 3 Kumar Book
14. 16.10 11:00-12:45 Decision Tree + Classifier Evaluation Chap. 3 Kumar Book
15. 21.10 09:00-10:45 Evaluation Methods for Classification Models 9.chap3_basic_classification-2020.pdf Chap. 3 Kumar Book + Chap. 4 Kumar Book
16. 22.10 09:00-10:45 Statistical tool for model evaluation + Rule based classification 10-rule-based-clussifiers.pdf Chap. 3 Kumar Book + Chap. 4 Kumar Book
17. 23.10 11:00-12:45 Rule based classification + Instance-based Classification 11-knn.pptx Chap. 4 Kumar Book
18. 28.10 09:00-10:45 Naive Bayesian Classifier + Ensemble Classifieres 12-naive_bayes.pdf 13_ensemble_2020.pdf Chap. 4 Kumar Book
19. 29.10 09:00-10:45 SVM & NN 14_svm_2020.pdf 15_neural_networks_2020.pdf Chap. 4 Kumar Book
20. 30.10 11:00-12:45 MLNN & Lab on Classification Nootebook Python for classification Chap. 4 Kumar Book
21. 04.11 09:00-10:45 Regression & Association Rule Mining 16_linear_regression.pdf 17_association_analysis.pdf Regression: Appendix D in Kumar BOOK Chap.5 Association Rules: Kumar Book
22. 05.11 09:00-10:45 Association Rule Mining Chap.5 Association Rules: Kumar Book
23. 06.11 11:00-12:45 Sequential Pattern Mining 18_sequential_patterns_2020.pdf Chap.6 Kumar Book
24. 11.11 09:00-10:45 Ethics in AI & Privacy 19_ethics_privacy.pdf Report in Trustworthy AI
25. 12.11 09:00-10:45 Ethics in AI & Privacy Overview on Privacy allegato11-cpdp13.pdf Privacy by design
26. 13.11 11:00-12:45 Ethics in AI & Privacy, Explainability 20_explainability_2020.pdf
27. 18.11 09:00-10:45 Explainability 20_explainability_2020.pdf Material: LORE LIME Survey ABELE
28. 19.11 09:00-10:45 Anomaly Detection 21_anomaly_detection_2020.pdf Chap. 9 of Kumar Book
29. 20.11 11:00-12:45 Anomaly Detection anomalydetection.ipynb.zip Chap. 9 of Kumar Book
30. 25.11 09:00-10:45 Time series Siminarity 22_time_series_similarity.pdf Overview on DM for time series, DTW paper by Sakoe and Chiba, 1978
31. 26.11 09:00-10:45 Time series Clustering 22_time_series_similarity.pdf
32. 27.11 11:00-12:45 Lab on Association Rules and Sequential Pattern Mining patterns.zip
33. 02.12 09:00-10:45 Time Series: Motif Discovery 23_time_series_motif_shapelets.pdf randomproj.pdfmatrixprofile.pdf
34. 03.12 09:00-10:45 Time Series: Shapelets Discovery + Ex. DTW + Subsequences + Thesis available 23_time_series_motif_shapelets.pdf ex-dtw-sequences.pdf Thesis Proposals shaplet.pdf
04.12 11:00-12:45 Lecture Canceled
35. 09.12 09:00-10:45 Paper Presentation
36. 10.12 09:00-10:45 Paper Presentation
37. 11.12 11:00-12:45 Paper Presentation

Exams

Mid-term Project

A project consists in data analyses based on the use of data mining tools. The project has to be performed by a team of 2/3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The students must deliver both: paper (single column) and well commented Python Notebooks.

Project to be delivered during the exam sessions

Students who did not deliver the above project within 4 Jan 2021 need to ask by email a new project to the teacher.

Paper Presentation (OPTIONAL)

Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions. They only need to present the project (see next point).

Oral Exam

Exam Dates

TBD

Exam Sessions

TBD

Reading About the "Data Scientist" Job

… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.

Previous years

Data Mining (309AA) - 9 CFU A.Y. 2020/2021

DM-2019/20