Indice

Data Mining A.A. 2013/14

Instructors - Docenti:

Teaching assistant - Assistente:

News

Learning goals -- Obiettivi del corso

… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.

La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:

  1. i concetti di base del processo di estrazione della conoscenza: studio e preparazione dei dati, forme dei dati, misure e similarità dei dati;
  2. le principali tecniche di datamining (regole associative, classificazione e clustering). Di queste tecniche si studieranno gli aspetti formali e implementativi;
  3. alcuni casi di studio nell’ambito del marketing e del supporto alla gestione clienti, del rilevamento di frodi e di studi epidemiologici.
  4. l’ultima parte del corso ha l’obiettivo di introdurre gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza

Reading about the "data scientist" job

Hours - Orario e Aule

Classes - Lezioni: DM 1

Giorno Orario Aula
Giovedì/Thursday 14:00 - 16:00 Aula B
Venerdì/Friday 14:00 - 16:00 Aula A1

Classes - Lezioni: DM 2

Giorno Orario Aula
Monday 9:00 - 11:00 Aula N1
Wednesday 9:00 - 11:00 Aula L1

Office hours - Ricevimento:

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

Slides of the classes -- Slides del corso

Testi di esame

Data mining software

Class calendar - Calendario delle lezioni (2013-2014)

First part of course, first semester (DMF - Data mining: foundations)

Day Aula Topic Learning material Instructor
1. 26.09.2013 14:00-16:00 B Intro: data mining & knowledge discovery process Textbook, Chapt. 1 dm_intro-2011.pdf Pedreschi
2. 27.09.2013 14:00-16:00 A1 Intro: data mining & knowledge discovery process Textbook, Chapt. 1 dm_intro-2011.pdf Pedreschi
3. 03.10.2013 14:00-16:00 B Data: types and basic measures Textbook, Chapt. 2 chap2_data_new.pdf Pedreschi
4. 10.10.2013 14:00-16:00 B Data: types and basic measures Textbook, Chapt. 2 chap2_data_new.pdf Pedreschi
5. 11.10.2013 14:00-16:00 A1 Exploratory data analysis and data understanding. Textbook, Chapt. 3 chap3_data_exploration.pdf Pedreschi
6. 17.10.2013 14:00-16:00 B Frequent Pattern Mining. Textbook, Chapt. 6 2-3tdm-restructured_assoc_2013.pdf Giannotti
7. 18.10.2013 14:00-16:00 A1 Frequent Pattern Mining. Textbook, Chapt. 6 2-3tdm-restructured_assoc_2013.pdf Giannotti
8. 24.10.2013 14:00-16:00 B Association Rule Mining. Giannotti
9. 25.10.2013 14:00-16:00 A1 Association Rule Mining and Knime Textbook, Chapt. 6 Example on AR Knime Monreale
10. 31.10.2013 14:00-16:00 B Classification and predictive methods Textbook, Chapt. 4 chap4_basic_classification.pdf Pedreschi
11. 14.11.2013 14:00-16:00 B Classification. Decision trees Textbook, Chapt. 4 chap4_basic_classification.pdf Pedreschi
12. 15.11.2013 14:00-16:00 A1 Classification. Decision trees Textbook, Chapt. 4 chap4_basic_classification.pdf Pedreschi
13. 21.11.2013 14:00-16:00 B Classification. Rule-based and bayesian methods Textbook, Chapt. 4 chap4_basic_classification.pdf Pedreschi
14. 22.11.2013 14:00-16:00 A1 Classification. Validation and Weka Lab Pedreschi
16. 28.11.2013 14:00-16:00 B Classification. Validation and Weka Lab. Clustering: introduction. Textbook, Chapt. 8 dm2014_clustering_intro.pdf Nanni
15. 29.11.2013 14:00-16:00 A1 Clustering analysis. Centroid-based methods Textbook, Chapt. 8 dm2014_clustering_kmeans.pdf Nanni
16. 05.12.2013 14:00-16:00 B Clustering analysis. Hierarchical methods Textbook, Chapt. 8 dm2014_clustering_hierarchical.pdf Nanni
17. 06.12.2013 14:00-16:00 A1 Clustering analysis. Density-based methods Textbook, Chapt. 8 dm2014_clustering_dbscan.pdf Nanni
18. 12.12.2013 14:00-16:00 B Clustering analysis. Validation and Weka Lab Textbook, Chapt. 8 dm2014_clustering_validation.pdf Nanni
19. 13.12.2013 14:00-16:00 A1 Wrap-up. Presentation of Second Semester syllabus Nanni

Second part of course, second semester (DMA - Data mining: advanced topics and case studies)

Day Aula Topic Learning material Instructor
1. 17.02.2014 9:00-11:00 N1 Introduction + Advanced Classification Methods / 1 Textbook, Chapt. 5 chap5_alternative_classification.pdf Pedreschi
2. 19.02.2014 9:00-11:00 L1 Advanced Classification Methods / 2 Pedreschi
3. 24.02.2014 9:00-11:00 N1 Advanced Classification Methods / 3 Pedreschi
4. 26.02.2014 9:00-11:00 L1 Case study- CRM1- Customer Segmentation - CRISP 1.dm2-intro-airmiles-stulong-crisp.ppt.pdf Giannotti
5. 3.03.2014 9:00-11:00 N1 Sequential patterns / 1 2.dm2_association_analysis_in_short_sequentialpatterns.ppt.pdf Giannotti
6. 5.03.2014 9:00-11:00 L1 Case Study: CRM on retail selling / 1 - Churn analysis 2.dm3_churn-analysis.ppt.pdf Giannotti
7. 10.03.2014 9:00-11:00 N1 Sequential patterns / 2 3.dm2_sequentialpatterns.ppt.pdf Giannotti
12.03.2014 9:00-11:00 L1 Suspended
8. 17.03.2014 9:00-11:00 N1 Graph mining graph_mining_2014_fixed.pdf Nanni
9. 19.03.2014 9:00-11:00 L1 Case Study: CRM on retail selling - Promotions/ 1 dm2_crm_promotional-sales_2014.pdf Paper on promotions Giannotti
10. 24.03.2014 9:00-11:00 N1 Time series / 1 time_series_from_keogh_tutorial.pdf Nanni
11. 26.03.2014 9:00-11:00 L1 Case Study: CRM on retail selling - Promotions / 2 Giannotti
12. 07.04.2014 9:00-11:00 N1 Time series / 2 Nanni
13. 09.04.2014 9:00-11:00 L1 Case Study: Geo-marketing Geo-churn, crm2014_pennacchioli_bigdata13.pdf Nanni
14. 14.04.2014 9:00-11:00 N1 Spatial/Spatiotemporal analysis / 1 7.dm2_mobilitydatamining_.pptx.pdf chap06_mobility_data_mining-1.pdf Giannotti
15. 16.04.2014 9:00-11:00 L1 Spatial/Spatiotemporal analysis / 2 & Projects presentation dm2_projects_2014.pdf Giannotti & Nanni
16. 28.04.2014 9:00-11:00 N1 Case study: Mobility / 1 Mobility case studies 1 Giannotti
17. 30.04.2014 9:00-11:00 L1 Platform M_Atlas Nanni
18. 05.05.2014 9:00-11:00 N1 Students' short seminars Mining changes in customer behavior in retail marketing., An e-customer behavior model with online analytical mining for internet marketing planning. Nanni
19. 07.05.2014 9:00-11:00 L1 Case study: Mobility / 2 Mobility case studies 2 Nanni
20. 12.05.2014 9:00-11:00 N1 Ethical Issues in Data Analytics Privacy: Regulations and and Privacy Aware Data Mining Giannotti
21. 14.05.2014 9:00-11:00 L1 Ethical Issues / Fraude Detection Case Study Giannotti
22. 19.05.2014 9:00-11:00 N1 Projects discussion Giannotti/Nanni

Modalità di esame

Esame DM parte I

L'esame consiste in una prova scritta ed in una prova orale:

Esame DM parte II

[ Italian ]

L'esame consta di tre parti:

[ English ]

The exam is composed of three parts:

Esercizi 2013-2014

Esercizi DM parte I -- Exercises DM First Part

Esercizi DM parte II - DM exercises Part 2

Appelli di esame

Verifiche intermedie/Esercizi

Data Orario Luogo Note Voti
I Esercizio e II Esercizio

Appelli regolari / Exam sessions

Session Date Time Room Notes Results
1. Thursday 16 January 2014 9.30 TBD A1
2. Monday 10 February 2014 9.30 TBD C
3. Thursday 20 February 2014 14.00 TBD Predreschi's office
4. Tuesday 25 February 2014 14.00 TBD Predreschi's office
5. Monday 9 June 2014 9.00 N1 If needed, exams will continue on 10/6 and 11/6 in room L1 Data Mining I: Results of written exam, June 9th, 2014
6. Monday 30 June 2014 9.00 N If needed, exams will continue on 1/7 and 2/7 in rooms P and E Data Mining II: Results of written exam, June 30th, 2014
7. Monday 21 July 2014 9.00 L1 If needed, exams will continue on 10/6 and 11/6 in room L1
8. Tuesday 9 September 2014 15.30 C1
Session Date Time Room Notes Results
1. Monday 19 January 2015 9.00 C
2. Monday 16 February 2015 9.00 C

Appelli straordinari / Extra sessions

Date Time Room Notes Results
7 November 2014 9:00-11:00 C1

Edizioni anni precedenti