Strumenti Utente

Strumenti Sito


magistraleinformatica:dmi:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
magistraleinformatica:dmi:start [15/09/2024 alle 14:51 (5 settimane fa)] – [First Semester] Anna Monrealemagistraleinformatica:dmi:start [15/10/2024 alle 08:35 (5 giorni fa)] (versione attuale) – added second outlier lecture Mattia Setzu
Linea 1: Linea 1:
 ====== Data Mining (309AA) - 9 CFU A.Y. 2024/2025 ====== ====== Data Mining (309AA) - 9 CFU A.Y. 2024/2025 ======
  
-**Instructor:**+**Instructors:**
   * **Anna Monreale**   * **Anna Monreale**
     * KDDLab, Università di Pisa     * KDDLab, Università di Pisa
Linea 15: Linea 15:
  
 ====== News ====== ====== News ======
 +  * [03.10.2024] ** Uploaded project dataset and instructions**
 +  * [27.09.2024] ** The Github repo is now live**
 +  * [21.09.2024] ** Schedule updated, see details below** 
   * [14.09.2024] ** The lectures will start on 19th September 2024**    * [14.09.2024] ** The lectures will start on 19th September 2024** 
    
Linea 29: Linea 32:
      * Ethical Issues      * Ethical Issues
  
-====== Hours and Rooms ======+====== Schedule ======
  
 **Classes** **Classes**
Linea 35: Linea 38:
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
 |  Tuesday    11:00 - 13:00  |  Room C1  |  |  Tuesday    11:00 - 13:00  |  Room C1  | 
-|  Thursday  |  09:00 - 11:00  |  Room  +|  Thursday  |  14:00 - 16:00  |  Room A1  
 |  Friday    |  09:00 - 11:00  |  Room C1  |  |  Friday    |  09:00 - 11:00  |  Room C1  | 
  
Linea 41: Linea 44:
  
 **Office hours - Ricevimento:** **Office hours - Ricevimento:**
-Anna Monreale: TBD+  * Anna Monreale: Thu 09:00-11:00 - Online using Teams or in my Office (Appointment by email).  
 +  * Mattia Setzu: Infos on [[https://unimap.unipi.it/cercapersone/dettaglio.php?ri=177323&template=dett_didattica.tpl|Unimap]]
  
-A [[https://teams.microsoft.com/l/team/19%3Aq8IK5DrzMwEE5TxVhuw4QdYEVFJ06KVITI5jSJTmaJ81%40thread.tacv2/conversations?groupId=5fae2fa6-38fd-414f-a0c9-ffbd8e6f0710&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams Channel]] will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students.   +A [[https://teams.microsoft.com/l/team/19%3Aq8IK5DrzMwEE5TxVhuw4QdYEVFJ06KVITI5jSJTmaJ81%40thread.tacv2/conversations?groupId=5fae2fa6-38fd-414f-a0c9-ffbd8e6f0710&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams Channel]] will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students.
-====== Learning Material -- Materiale didattico ======+
  
-===== Textbook -- Libro di Testo =====+====== Teaching Material ======
  
-  Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006 +**Books** 
-    [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]] +^ Title ^ Authors ^ Edition ^ 
-    * Chapters 4,6 and 8 are also available at the publisher's Web site. +[[http://www-users.cs.umn.edu/~kumar/dmbook/index.php|Introduction to Data Mining]] | Pang-Ning TanMichael Steinbach, Vipin Kumar | 2nd | 
-  * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition. +| [[https://link.springer.com/book/10.1007/978-3-031-48956-3|Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications]] | Laura Igual,  Santi Seguí | 2nd | 
-  *  Jake VanderPlas. **[[http://shop.oreilly.com/product/0636920034919.do| Python Data Science Handbook: Essential Tools for Working with Data.]]** 1st Edition.  +[[http://shop.oreilly.com/product/0636920034919.do| Python Data Science Handbook: Essential Tools for Working with Data]] | Jake VanderPlas | 1st 
-   For Python Notions: {{ :magistraleinformatica:dmi:python_basics.ipynb.zip Very basic notions on Python}} +| [[https://github.com/janishar/mit-deep-learning-book-pdf|Deep Learning]] | Ian Goodfellow, Yoshua Bengio, Aaron Courville | | 
 +| [[https://math.mit.edu/~gs/linearalgebra/ila5/indexila5.html|Introduction to Linear Algebra]] | Gilbert Strang | 5th |
  
  
-===== Slides =====+**Online tutorials** 
 + 
 +^ ^ Authors ^ 
 +| [[https://brianmcfee.net/dstbook-site/content/intro.html|Digital Signals Theory]] | Brian McFee | 
 +| [[https://rtavenar.github.io/blog/dtw.html|An introduction to Dynamic Time Warping]] | Romain Tavenard | 
 +| [[https://github.com/msetzu/intro_to_ds_and_ml/blob/master/python/notebooks/Python.ipynb|Introduction to Python]] | Mattia Setzu | 
 + 
 + 
 +**Slides** 
 + 
 +The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]].
  
-  * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]]. 
-    
  
      
-===== Software=====+**Software**
  
-  * Python - Anaconda (at least 3.7 version!!!): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included) +Software material available in the [[https://github.com/data-mining-UniPI/teaching24|Github repository]].
-  * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]] +
-  * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language[[http://pandas.pydata.org/ | Documentation page]]+
  
    
Linea 73: Linea 83:
 ===== First Semester  ===== ===== First Semester  =====
  
-^ ^ Day ^ Topic ^ Learning material ^ References ^ Video Lectures ^ Teacher ^ +^ ^ Day ^ Topic ^ Teaching material ^ References ^ Video Lectures ^ Teacher ^ 
-|    |  17.09  | Candeled    |   | +|    |  17.09  | Candeled    |   
-|1.  |  19.09  | Overview. Introduction to KDD    |Chap. 1 Kumar Book | | |+|1.  |  19.09  | Overview. Introduction to KDD   {{ :magistraleinformatica:dmi:1-overview-2024.pdf |}} {{ :magistraleinformatica:dmi:1-intro-dm-2024.pdf |}}  |Chap. 1 Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EYl0J0Cq1bNHs38C6SrwncgBxCJWh9U4R5KT6mc5yMRs3g?e=PCkgkD|Part1]] [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EWFjzWMH_nBHpYeoNtNcC8kByd1bl3WSkjp9Gd4jsxgNVQ?e=bQeAQh|Part2]]| Monreale| 
 +|2.  |  20.09  | Data Understanding + Data Preparation (Aggr., Sampling, Dim. Reduction, Feature Selection, Feature Creation).    {{ :magistraleinformatica:dmi:2-data_understanding-2024.pdf |}} {{ :magistraleinformatica:dmi:3-data_preparation-2024.pdf |}}|Chap.2 Kumar Book and additioanl resource of Kumar Book: [[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Data Exploration Chap.]] If you have the first ed. of KUMAR this is the Chap 3 | [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EfBuUxYrbA9CiqB_6oVCsfkB4Gq2NFRbJ1KjQCdX0o6AeQ?e=2wTqaV|Lecture Recording]] |Monreale| 
 +|3    |  24.09  | Data representation   | Slides: {{ :magistraleinformatica:dmi:Data representation.pdf |}}. | References: Introduction to linear algebra (Sections 1, 3.1, 4.2, 6.1, 6.4, 6.5, 7.3), [[https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf|t-SNE paper]], [[https://arxiv.org/abs/1802.03426 | UMAP paper (Section 3)]]  | [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EX02UyriTtFHn1TRskD5Dn8Bddv3L24AX3bP_bdGVJTnXg?e=9jTleh|Lecture Recording]] |Setzu 
 +|4.    26.09  | Data Cleaning + Transformations. Python Lab: Data Understanding and Preparation |{{ :magistraleinformatica:dmi:5-data_cleaning_transformation.pdf |Data Cleaning and Transformations }} | | [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EYfnLBl471NJml8dIpRS6qABH9gSIihctow_8BrFt2PT4g?e=i8nDFh|Part1: Data Cleaning_Trasformations]] [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EeW1ul3HF6RAq79-OLJJOUMB5D_dL7m9gC8RbjFpYGwbOg?e=6NfOLC|Python Lab]]|Monreale, Mannocci | 
 +|5.    27.09  | Python Lab: Data Understanding and Preparation + Similarities | [[https://github.com/data-mining-UniPI/teaching24|Github repository]] {{ :magistraleinformatica:dmi:6-data_similarity.pdf |}}| |[[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EYJ__0La271Bi2D4NFN44b4BYEF2b9f2praZvHlpBTtcrw?e=Ea38Xq|PythonLab]] [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EZ-ObDU8SIZHkyPlbV3b9S0BjyhXjSyc30zrncGKSAGgNA?e=HAK0to|Similarity]]|Monreale, Mannocci | 
 +|6.    01.10  | Introduction to Clustering and Centroid-based clustering |{{ :magistraleinformatica:dmi:6-basic_cluster_analysis-intro.pdf |Introduction to Clustering Analysis}} {{ :magistraleinformatica:dmi:6-basic_cluster_analysis-kmeans.pdf |K-means}} |Chap. 7 Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/ESAOwIabbl5DsfUKuobQ_X0Bw2FNXh4EYfB1awaNUMwFlQ?e=GN9B6S|Lecture Recording]] |Monreale | 
 +|7.    03.10  | Hierarchical Clustering | {{ :magistraleinformatica:dmi:9-basic_cluster_analysis-hierarchical.pdf |}} |Chap. 7 Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EYXSdANHxrpKqPRcxXd3SOQBYBOTb7FxEZ_rS7KYdSZojQ?e=qibdE2|Lecture Recording]]|Monreale | 
 +|8.    04.10  | Density Based Clustering & Variants of K-means | {{ :magistraleinformatica:dmi:10-basic_cluster_analysis-dbscan.pdf |}} {{ :magistraleinformatica:dmi:11-basic_cluster_analysis-kmeans-variants.pdf |}}| Chap. 7 Kumar Book | [[https://unipiit.sharepoint.com/:v:/s/a__td_62949/EfULV1PoxuJIuZR8M1Mhz9wBNFSqCVqDN5-L63y-FdChNQ?e=oiOGvR|Lecture Recording]] |Monreale | 
 +|9.    08.10  | Clustering Validation + Python Lab | {{ :magistraleinformatica:dmi:12-basic_cluster_analysis-validity.pdf |}} See github for the python noteebook on clustering| Chap. 7 Kumar Book | |Monreale, Mannocci|  
 +|    10.10  | Lecture canceled due to UNIPI Orienta | | | | 
 +|    11.10  | Lecture canceled due to UNIPI Orienta | | | | 
 +|10.    15.10  | Outlier detection | {{ magistraleinformatica:dmi:Anomaly detection.pdf| Outlier Detection }} | Sections 1.3.1-4, 2.2 Kumar book| | Setzu | 
 +|11.    17.10  | Outlier detection | {{ magistraleinformatica:dmi:Anomaly detection.pdf| Outlier Detection }} | Sections 3.2-3, 4.2-5 2.2 Kumar book| | Setzu | 
 + 
 + 
  
-   
 ====== Exams ====== ====== Exams ======
-TBD+**Project:**  
 + 
 +A project consists in data analyses based on the use of data mining tools.  
 +The project has to be performed by a team of 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and  well commented Python Notebooks. 
 + 
 +  * First part of the project consists in the **assignments** described here: {{  :magistraleinformatica:dmi:project.pdf  | Project Description}} 
 +  - **Dataset: {{ :magistraleinformatica:dmi:dataset.tar| Dataset}}**  
 +  - **Deadline**: the fist part has to be delivered by ** November 19th, 2024 **. The delivery will be through Teams' assignement 
 +  
 +  * Second part of the project consists in the assignment described here:  
 +     - **Deadline**: Dec 29, 2024  
 + 
 +Students who did not deliver the above project within **Dec 29, 2024** need to ask by email a new project to the teachers. The project that will be assigned will require about 20 days of work and after the delivery it will be discussed during the oral exam. 
 + 
 +** Paper Presentation (OPTIONAL)** 
 + 
 +Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions on the entire program. They only need to present the project (see next point) and answer open question only on the topics which will not be covered by the project. The paper presentation can be done by the group or by a single person. 
 + 
 +**Oral Exam** 
 +  * **Project presentation** (with slides) – 15 minutes: mandatory for all the students with question fo understanding the details of any part of the project. 
 +  * ** Open questions on the entire program **: for students who will not opt for paper presentation 
 +  * ** Open questions on the topics which will not be covered by the project ** only for students opting  for paper presentation. 
 +  * Group presentations of the project are preferred. If this is impossible please contact me for finding a solution. 
 + 
 +**How to book for the exam colloquium? ** 
 +  
 +In https://esami.unipi.it/ you can find the dates for the exam: one for January and one for February. Each student must do the registration on one of the 2 dates. These are not the dates of the colloquium or project delivery but we will use the list of registered students for organizing the exam dates. After that deadline we will share with you a calendar for the oral exam. 
 + 
 ====== Previous years ===== ====== Previous years =====
 [[DM-INF 2023-2024]] [[DM-INF 2023-2024]]
magistraleinformatica/dmi/start.1726411919.txt.gz · Ultima modifica: 15/09/2024 alle 14:51 (5 settimane fa) da Anna Monreale

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki