magistraleinformatica:dmi:start
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
magistraleinformatica:dmi:start [07/01/2021 alle 21:03 (4 anni fa)] – [Exams] Anna Monreale | magistraleinformatica:dmi:start [23/09/2024 alle 14:06 (41 ore fa)] (versione attuale) – Added data representation slides Mattia Setzu | ||
---|---|---|---|
Linea 1: | Linea 1: | ||
- | < | + | ====== Data Mining |
- | <!-- Google Analytics --> | + | |
- | <script type=" | + | |
- | (function(i, | + | |
- | (i[r].q=i[r].q||[]).push(arguments)}, | + | |
- | m=s.getElementsByTagName(o)[0]; | + | |
- | })(window, | + | |
- | ga(' | + | **Instructors:** |
- | ga(' | + | |
- | ga(' | + | |
- | + | ||
- | ga(' | + | |
- | ga(' | + | |
- | setTimeout(" | + | |
- | </ | + | |
- | <!-- End Google Analytics --> | + | |
- | <!-- Capture clicks --> | + | |
- | < | + | |
- | jQuery(document).ready(function(){ | + | |
- | jQuery(' | + | |
- | var fname = this.href.split('/' | + | |
- | ga(' | + | |
- | }); | + | |
- | jQuery(' | + | |
- | var fname = this.href.split('/' | + | |
- | ga(' | + | |
- | }); | + | |
- | jQuery(' | + | |
- | var fname = this.href.split('/' | + | |
- | ga(' | + | |
- | }); | + | |
- | jQuery(' | + | |
- | var fname = this.href.split('/' | + | |
- | ga(' | + | |
- | }); | + | |
- | jQuery(' | + | |
- | var fname = this.href.split('/' | + | |
- | ga(' | + | |
- | }); | + | |
- | }); | + | |
- | </ | + | |
- | </ | + | |
- | ====== Data Mining (309AA) - 9 CFU ====== | + | |
- | + | ||
- | **Instructor:** | + | |
* **Anna Monreale** | * **Anna Monreale** | ||
* KDDLab, Università di Pisa | * KDDLab, Università di Pisa | ||
* [[anna.monreale@unipi.it]] | * [[anna.monreale@unipi.it]] | ||
+ | * **Mattia Setzu** | ||
+ | * KDDLab, Università di Pisa | ||
+ | * [[mattia.setzu@unipi.it]] | ||
+ | |||
**Teaching Assistant: | **Teaching Assistant: | ||
- | * **Francesca Naretto** | + | * * **Lorenzo Mannocci** |
- | * KDDLab, SNS, Pisa | + | * University of Pisa |
- | * [[francesca.naretto@sns.it]] | + | * [[lorenzo.mannocci@phd.unipi.it]] |
====== News ====== | ====== News ====== | ||
- | | + | |
- | * [09.09.2020] The course | + | * [14.09.2024] ** The lectures |
+ | |||
====== Learning Goals ====== | ====== Learning Goals ====== | ||
* Fundamental concepts of data knowledge and discovery. | * Fundamental concepts of data knowledge and discovery. | ||
Linea 61: | Linea 23: | ||
* Data preparation | * Data preparation | ||
* Clustering | * Clustering | ||
- | * Classification | + | * Classification |
* Pattern Mining and Association Rules | * Pattern Mining and Association Rules | ||
* Outlier Detection | * Outlier Detection | ||
Linea 68: | Linea 30: | ||
* Ethical Issues | * Ethical Issues | ||
- | ====== | + | ====== |
**Classes** | **Classes** | ||
^ Day of Week ^ Hour ^ Room ^ | ^ Day of Week ^ Hour ^ Room ^ | ||
- | | | + | | |
- | | Thursday | + | | Thursday |
- | | Friday | + | | Friday |
**Office hours - Ricevimento: | **Office hours - Ricevimento: | ||
- | Anna Monreale: | + | * Anna Monreale: |
- | Francesca Naretto: Monday: 15:00-18:00 online using Teams (Appointment by email) | + | * Mattia Setzu: Infos on [[https:// |
- | + | A [[https:// | |
- | ====== Learning Material | + | |
- | ===== Textbook -- Libro di Testo ===== | + | ====== Teaching Material ====== |
- | | + | **Books** |
- | | + | ^ Title ^ Authors ^ Edition ^ |
- | * Chapters 4,6 and 8 are also available at the publisher' | + | | [[http:// |
- | * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7 | + | | [[https:// |
- | * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition. | + | | [[http:// |
- | * Jake VanderPlas. **[[http:// | + | | [[https:// |
+ | | [[https:// | ||
- | ===== Slides ===== | + | **Online tutorials** |
- | * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook' | + | ^ ^ Authors |
- | + | | [[https://brianmcfee.net/dstbook-site/content/intro.html|Digital Signals Theory]] | Brian McFee | | |
- | + | | [[https://rtavenar.github.io/blog/dtw.html|An introduction to Dynamic Time Warping]] | Romain Tavenard | |
- | + | | [[https://github.com/msetzu/intro_to_ds_and_ml/blob/master/python/notebooks/Python.ipynb|Introduction to Python]] | Mattia Setzu | | |
- | ===== Software===== | + | |
- | + | ||
- | * Python - Anaconda (3.7 version!!!): | + | |
- | * Scikit-learn: | + | |
- | * Pandas: pandas is an open source, BSD-licensed library providing high-performance, | + | |
- | + | ||
- | + | ||
- | ====== Class Calendar (2020/2021) ====== | + | |
- | + | ||
- | ===== First Semester | + | |
- | + | ||
- | ^ ^ Day ^ Topic ^ Learning material ^ References | + | |
- | |1.| 16.09 09:00-10:45 | Overview. Introduction to KDD | {{ : | + | |
- | |2.| 17.09 09:00-10:45 | Data Understanding | {{ : | + | |
- | |3.| 18.09 09:00-10:45 | Data Preparation | + | |
- | |4.| 23.09 09:00-10:45 | Data Preparation: | + | |
- | |5.| 24.09 09:00-10:45 | Data Similarities. Introduction to Clustering.|{{ : | + | |
- | |6.| 25.09 11:00-12:45 | LAB: Data Understanding in Python | {{ : | + | |
- | |7.| 30.09 09:00-10:45 | Center-based clustering: kmeans| {{ : | + | |
- | |8.| 01.10 09:00-10:45 | Center-based clustering: Bisecting K-means, Xmeans, EM| Same Slides of the previous lectures | Chap. 7 Kumar Book, {{ : | + | |
- | |9.| 02.10 11:00-12:45 | Hierarchical clustering| {{ : | + | |
- | |10.| 07.10 09:00-10:45 | Density based clustering|{{ : | + | |
- | |11.| 08.10 09:00-10:45 | Lab: clustering + Project Assignment | {{ : | + | |
- | | | + | |
- | |12.| 14.10 09:00-10:45 | Classification Problem + Decision trees| | + | |
- | |13.| 15.10 09:00-10:45 | Only 30 minutes of Discussion on the project due to connection problems| | + | |
- | |14.| 16.10 11:00-12:45 | Decision Tree + Classifier Evaluation| | + | |
- | |15.| 21.10 09:00-10:45 | Evaluation Methods for Classification Models| | + | |
- | |16.| 22.10 09:00-10:45 | Statistical tool for model evaluation + Rule based classification| {{ : | + | |
- | |17.| 23.10 11:00-12:45 | Rule based classification + Instance-based Classification| {{ : | + | |
- | |18.| 28.10 09:00-10:45 |Naive Bayesian Classifier + Ensemble Classifieres | {{ : | + | |
- | |19.| 29.10 09:00-10:45 | SVM & NN | {{ : | + | |
- | |20.| 30.10 11:00-12:45 | MLNN & Lab on Classification| {{ : | + | |
- | |21.| 04.11 09:00-10:45 | Regression & Association Rule Mining| {{ : | + | |
- | |22.| 05.11 09:00-10:45 | Association Rule Mining| | Chap.5 Association Rules: Kumar Book| | + | |
- | |23.| 06.11 11:00-12:45 | Sequential Pattern Mining| {{ : | + | |
- | |24.| 11.11 09:00-10:45 | Ethics in AI & Privacy | {{ : | + | |
- | |25.| 12.11 09:00-10:45 | Ethics in AI & Privacy | | {{ : | + | |
- | |26.| 13.11 11:00-12:45 | Ethics in AI & Privacy, Explainability | {{ : | + | |
- | |27.| 18.11 09:00-10:45 | Explainability | {{ : | + | |
- | |28.| 19.11 09:00-10:45 | Anomaly Detection | {{ : | + | |
- | |29.| 20.11 11:00-12:45 | Anomaly Detection | {{ : | + | |
- | |30.| 25.11 09:00-10:45 |Time series Siminarity | + | |
- | |31.| 26.11 09:00-10:45 |Time series Clustering | + | |
- | |32.| 27.11 11:00-12:45 |Lab on Association Rules and Sequential Pattern Mining | + | |
- | |33.| 02.12 09:00-10:45 | Time Series: Motif Discovery | + | |
- | |34.| 03.12 09:00-10:45 | Time Series: Shapelets Discovery + Ex. DTW + Subsequences + Thesis available| {{ : | + | |
- | | | + | |
- | |35.| 09.12 09:00-10:45 | Paper Presentation | | | | + | |
- | |36.| 10.12 09:00-10:45 | Paper Presentation | | | | + | |
- | |37.| 11.12 11:00-12:45 | Paper Presentation | | + | |
+ | **Slides** | ||
+ | The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook' | ||
+ | | ||
+ | **Software** | ||
+ | Software material available in the [[https:// | ||
+ | |||
+ | ====== Class Calendar (2024/2025) ====== | ||
+ | ===== First Semester | ||
+ | ^ ^ Day ^ Topic ^ Teaching material ^ References ^ Video Lectures ^ Teacher ^ | ||
+ | | | 17.09 | Candeled | ||
+ | |1. | 19.09 | Overview. Introduction to KDD | {{ : | ||
+ | |2. | 20.09 | Data Understanding + Data Preparation (Aggr., Sampling, Dim. Reduction, Feature Selection, Feature Creation). | ||
+ | |3 | 24.09 | Data representation | ||
+ | | ||
====== Exams ====== | ====== Exams ====== | ||
- | **Mid-term Project ** | ||
- | |||
- | A project consists in data analyses based on the use of data mining tools. | ||
- | The project has to be performed by a team of 2/3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The students must deliver both: paper (single column) and well commented Python Notebooks. | ||
- | |||
- | * First part of the project consists in the **assignments** described here: {{ : | ||
- | * **Dataset: | ||
- | * **Deadline**: | ||
- | * Second part of the project consists in the **assignment Task 3** described here: {{ : | ||
- | * **Deadline**: | ||
- | * Third part of the project consists in the **assignment Task 4** described here: {{ : | ||
- | * **Deadline**: | ||
- | |||
- | |||
- | ** Project to be delivered during the exam sessions ** | ||
- | |||
- | Students who did not deliver the above project within 4 Jan 2021 need to ask by email a new project to the teacher. | ||
- | |||
- | ** Paper Presentation (OPTIONAL)** | ||
- | |||
- | Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions. They only need to present the project (see next point). | ||
- | |||
- | **Oral Exam** | ||
- | * **Project presentation** (with slides) – 10 minutes: mandatory for all the students | ||
- | * ** Open questions ** on the entire program: optional only for students opting for paper presentation. | ||
- | |||
- | |||
- | ====== Exam Dates ====== | ||
- | |||
TBD | TBD | ||
- | ===== Exam Sessions | + | ====== Previous years ===== |
- | TBD | + | [[DM-INF 2023-2024]] |
+ | [[DM-INF 2022-2023]] | ||
- | ===== Reading About the "Data Scientist" | + | [[DM-INF 2021-2022]] |
- | ** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/ | + | [[DM-INF 2020-2021]] |
- | //Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// | + | [[http://didawiki.cli.di.unipi.it/doku.php/ |
- | * Data, data everywhere. The Economist, Feb. 2010 {{: | ||
- | * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http:// | ||
- | * Welcome to the yotta world. The Economist, Sept. 2011 {{: | ||
- | * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http:// | ||
- | * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http:// | ||
- | * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{: | ||
- | * Peter Sondergaard, | ||
- | * Towards Effective Decision-Making Through Data Visualization: | ||
- | |||
- | ====== Previous years ===== | ||
- | [[http:// | ||
magistraleinformatica/dmi/start.1610053387.txt.gz · Ultima modifica: 07/01/2021 alle 21:03 (4 anni fa) da Anna Monreale