Questa è una vecchia versione del documento!
Data Mining A.A. 2021/22
DM1 - Data Mining: Foundations (6 CFU)
Instructors:
Teaching Assistant
DM2 - Data Mining: Advanced Topics and Applications (6 CFU)
News
Learning Goals
Hours and Rooms
DM1
Classes
Day of Week | Hour | Room |
Monday | 11:00 - 13:00 | Aula C / MS Teams |
Thursday | 11:00 - 13:00 | Aula A1 / MS Teams |
Office hours - Ricevimento:
Prof. Pedreschi: Monday 16:00 - 18:00, Online
Prof. Nanni: appointment by email, Online
DM 2
Classes
Day of Week | Hour | Room |
Monday | 14:00 - 16:00 | MS Teams |
Wednesday | 16:00 - 18:00 | MS Teams |
Office Hours - Ricevimento:
Learning Material -- Materiale didattico
Textbook -- Libro di Testo
Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley, ISBN 0-321-32136-7, 2006
-
I capitoli 4, 6, 8 sono disponibili sul sito del publisher. – Chapters 4,6 and 8 are also available at the publisher's Web site.
Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. GUIDE TO INTELLIGENT DATA ANALYSIS. Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7
Laura Igual et al. Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications. 1st ed. 2017 Edition.
-
Slides
Software
Python - Anaconda (3.7 version!!!): Anaconda is the leading open data science platform powered by Python.
Download page (the following libraries are already included)
Scikit-learn: python library with tools for data mining and data analysis
Documentation page
Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Documentation page
-
-
-
Class Calendar (2021/2022)
First Semester (DM1 - Data Mining: Foundations)
Second Semester (DM2 - Data Mining: Advanced Topics and Applications)
| Day | Room | Topic | Learning material | Instructor | Recordings |
1. | ??.02.2022 ??:00-??:00 | link teams | Introduction, CRIPS, KNN | Intro, CRISP, KNN | Guidotti | link registrazione |
Exams
Exam DM1
The exam is composed of two parts:
A
project, that consists in exercises requiring the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, frequent pattern mining, and classification (guidelines will be provided for more details). The project has to be performed by min 3, max 4 people. It has to be performed by using Knime, Python or a combination of them. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to
datamining [dot] unipi [at] gmail [dot] com. Please, use “[DM1 2021-2022] Project” in the subject.
Project 1
Assigned: 30/09/2021
MidTerm Deadline: 21/11/2021 (half project required, i.e., Data understanding & Preparation and at least 2 clustering algorithms)
Final Deadline: TBD (complete project required)
-
Exam DM part II (DMA)
Exam Rules
Rules for DM2 exam available
here.
Exam Booking Periods
3rd Appello: ??/??/2022 00:00 - ??/??/2022 23:59
4th Appello: ??/??/2022 00:00 - ??/??/2022 23:59
5th Appello: ??/??/2022 00:00 - ??/??/2022 23:59
Exam Booking Agenda
Agenda Link: ???
3rd Appello: starts ??/??/2022
4th Appello: starts ??/??/2022
5th Appello: starts ??/??/2022
Important! if you book in the agenda in data in days between ??/??/2022 and ??/??/2022 you MUST be registered for the 3rd appello, if you book in the agenda in data in days between ??/??/2022 and ??/??/2022 you must be registered for the 4th appello, if you book in the agenda in data in days after ??/??/2022 you must be registered for the 5th appello.
The link to the agenda for booking a slot for the exam is displayed at the end of the registration.
During the exam the camera must remain open and you must be able to share your screen. For the exam could be required the usage of the Miro platform (https://miro.com/app/dashboard/).
The exam is composed of two parts:
A
project, that consists in employing the methods and algorithms presented during the classes for solving exercises on a given dataset. The project has to be realized by max 3 people. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 30 pages (suggested 25) of text including figures + 1 cover page (minimum font 11, minimum interline 1). The project must be delivered at least 7 days before the oral exam. The project must be delivered to
riccardo [dot] guidotti [at] unipi [dot] it AND
francesco [dot] spinnato [at] sns [dot] it with subject “[DM2 Project]”
An oral exam, that includes: (1) discussing topics presented during the classes, including the theory of the parts already covered by the written exam; (2) resolving simple exercises using the Miro platform; (3) discussing the project report with a group presentation;
Project Guidelines
N.B. When “solving the classification task”, remember, (i) to test, when needed, different criteria for the parameter estimation of the algorithms, and (ii) to evaluate the classifiers (e.g., Accuracy, F1, Lift Chart) in order to compare the results obtained with an imbalanced technique against those obtained from using the “original” dataset.
Exam Dates
Exam Sessions
Past Exams
Reading About the "Data Scientist" Job
… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.
Data, data everywhere. The Economist, Feb. 2010
download
Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011
link
Welcome to the yotta world. The Economist, Sept. 2011
download
Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012
link
Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012
link
Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics
download
Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012:
YouTube video
Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com.
download
Previous years