Strumenti Utente

Strumenti Sito


magistraleinformatica:dmi:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
magistraleinformatica:dmi:start [20/11/2021 alle 01:01 (3 anni fa)] – [First Semester] Anna Monrealemagistraleinformatica:dmi:start [18/09/2024 alle 09:33 (10 ore fa)] (versione attuale) – Update teaching material, and office hours Mattia Setzu
Linea 1: Linea 1:
-<html> +====== Data Mining (309AA- 9 CFU A.Y2024/2025 ======
-<!-- Google Analytics --> +
-<script type="text/javascript" charset="utf-8"> +
-(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function()+
-(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), +
-m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) +
-})(window,document,'script','//www.google-analytics.com/analytics.js','ga');+
  
-ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); +**Instructors:**
-ga('personalTracker.require', 'linker'); +
-ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); +
-   +
-ga('personalTracker.require', 'displayfeatures'); +
-ga('personalTracker.send', 'pageview', 'ruggieri/teaching/dm/'); +
-setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000);  +
-</script> +
-<!-- End Google Analytics --> +
-<!-- Global site tag (gtag.js) - Google Analytics --> +
-<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script> +
-<script> +
-  window.dataLayer = window.dataLayer || []; +
-  function gtag(){dataLayer.push(arguments);+
-  gtag('js', new Date()); +
- +
-  gtag('config', 'G-LPWY0VLB5W'); +
-</script> +
-<!-- Global site tag (gtag.js) - Google Analytics --> +
-<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script> +
-<script> +
-  window.dataLayer = window.dataLayer || []; +
-  function gtag(){dataLayer.push(arguments);+
-  gtag('js', new Date()); +
- +
-  gtag('config', 'G-LPWY0VLB5W'); +
-</script> +
-<!-- Capture clicks --> +
-<script> +
-jQuery(document).ready(function(){ +
-  jQuery('a[href$=".pdf"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'PDFs', fname); +
-  }); +
-  jQuery('a[href$=".r"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Rs', fname); +
-  }); +
-  jQuery('a[href$=".zip"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'ZIPs', fname); +
-  }); +
-  jQuery('a[href$=".mp4"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-  jQuery('a[href$=".flv"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-}); +
-</script> +
-</html> +
-====== Data Mining (309AA) - 9 CFU A.Y. 2021/2022 ====== +
- +
-**Instructor:**+
   * **Anna Monreale**   * **Anna Monreale**
     * KDDLab, Università di Pisa     * KDDLab, Università di Pisa
     * [[anna.monreale@unipi.it]]        * [[anna.monreale@unipi.it]]   
 +  * **Mattia Setzu**
 +    * KDDLab, Università di Pisa
 +    * [[mattia.setzu@unipi.it]]   
 +
 **Teaching Assistant:** **Teaching Assistant:**
-  * **Francesca Naretto** +  * * **Lorenzo Mannocci** 
-    * KDDLab, SNS, Pisa +    * University of Pisa 
-    * [[francesca.naretto@sns.it]]  +    * [[lorenzo.mannocci@phd.unipi.it]]  
  
 ====== News ====== ====== News ======
-  * [28.10.2021] ** Lecture of Friday 29.10.2021 will be canceled **  +  * [14.09.2024] ** The lectures will start on 19th September 2024**  
-  * [23.09.2021]  Please, fill this document: [[https://docs.google.com/spreadsheets/d/1YzHs_JSYPWYqnmkM7ccQc1WZSzGP7UsgxdBF-h5LcEA/edit?usp=sharing|Student-Lists anf Project groups]]. On Teams you can find instructions for GroupID + 
-  * [06.09.2021] The first lecture of this course will take place on Thursday, 16 Sept 2021. +
-  * [08.09.2021]People that intend to attend the course online should use this link: https://teams.microsoft.com/l/team/19%3aWKvq4kg0XbKZ5pEeiZcarbBXPCYsTvTwMkKZs2PWiHA1%40thread.tacv2/conversations?groupId=aea1385b-6721-4d90-a169-c97f7d066eca&tenantId=c7456b31-a220-47f5-be52-473828670aa1    +
 ====== Learning Goals ====== ====== Learning Goals ======
      * Fundamental concepts of data knowledge and discovery.      * Fundamental concepts of data knowledge and discovery.
Linea 88: Linea 29:
      * Ethical Issues      * Ethical Issues
  
-====== Hours and Rooms ======+====== Schedule ======
  
 **Classes** **Classes**
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Wednesday |  14:00 - 16:00  |  Room C  - Online  |  +|  Tuesday   |  11:00 - 13:00  |  Room C1  |  
-|  Thursday  |  14:00 - 16:00  |  Room C  - Online  |  +|  Thursday  |  09:00 - 11:00  |  Room C  |  
-|  Friday    |  09:00 - 11:00  |  Room A1 -  Online  +|  Friday    |  09:00 - 11:00  |  Room C1  
  
  
  
 **Office hours - Ricevimento:** **Office hours - Ricevimento:**
-Anna Monreale: Wednesday: 11:00-13:00 online using Teams (Appointment by email) +  * Anna Monreale: TBD 
-Francesca Naretto: Monday: 15:00-18:00 online using Teams (Appointment by email)+  * Mattia SetzuInfos on [[https://unimap.unipi.it/cercapersone/dettaglio.php?ri=177323&template=dett_didattica.tpl|Unimap]]
  
-  +A [[https://teams.microsoft.com/l/team/19%3Aq8IK5DrzMwEE5TxVhuw4QdYEVFJ06KVITI5jSJTmaJ81%40thread.tacv2/conversations?groupId=5fae2fa6-38fd-414f-a0c9-ffbd8e6f0710&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams Channel]] will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students.
-====== Learning Material -- Materiale didattico ======+
  
-===== Textbook -- Libro di Testo =====+====== Teaching Material ======
  
-  Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006 +**Books** 
-    [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]] +^ Title ^ Authors ^ Edition ^ 
-    * Chapters 4,6 and 8 are also available at the publisher's Web site. +[[http://www-users.cs.umn.edu/~kumar/dmbook/index.php|Introduction to Data Mining]] | Pang-Ning TanMichael SteinbachVipin Kumar | 2nd | 
-  * BertholdM.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-+| [[https://link.springer.com/book/10.1007/978-3-031-48956-3|Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications]] | Laura Igual,  Santi Seguí | 2nd | 
-  * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition. +[[http://shop.oreilly.com/product/0636920034919.do| Python Data Science Handbook: Essential Tools for Working with Data]] | Jake VanderPlas | 1st 
-   Jake VanderPlas. **[[http://shop.oreilly.com/product/0636920034919.do| Python Data Science Handbook: Essential Tools for Working with Data.]]** 1st Edition.  +| [[https://github.com/janishar/mit-deep-learning-book-pdf|Deep Learning]] | Ian Goodfellow, Yoshua Bengio, Aaron Courville | | 
-   For Python Notions{{ :magistraleinformatica:dmi:python_basics.ipynb.zip Very basic notions on Python}} +| [[https://math.mit.edu/~gs/linearalgebra/ila5/indexila5.html|Introduction to Linear Algebra]] | Gilbert Strang | 5th |
  
  
-===== Slides =====+**Online tutorials**
  
-  * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]]+^ ^ Authors ^ 
-   +[[https://brianmcfee.net/dstbook-site/content/intro.html|Digital Signals Theory]] | Brian McFee | 
 +| [[https://rtavenar.github.io/blog/dtw.html|An introduction to Dynamic Time Warping]] | Romain Tavenard | 
 +| [[https://github.com/msetzu/intro_to_ds_and_ml/blob/master/python/notebooks/Python.ipynb|Introduction to Python]] | Mattia Setzu |
  
-   
-===== Software===== 
  
-  Python - Anaconda (3.7 version!!!): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included) +**Slides**
-  Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]] +
-  Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]+
  
-  +The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]].
-====== Class Calendar (2021/2022) ======+
  
-===== First Semester  ===== 
  
-^ ^ Day ^ Topic ^ Learning material ^ References ^ Video Lectures ^ +   
-|  |  15.09  14:15‑16:00 | Lecture deleted  | | | | +**Software**
-|1.|  16.09  14:15‑16:00 | Overview. Introduction to KDD  | {{ :magistraleinformatica:dmi:2021-1-overview.pdf |}}{{ :magistraleinformatica:dmi:1-intro-dm.pdf |}} | Chap. 1 Kumar Book| [[https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Meeting%20in%20_General_-20210916_140839-Meeting%20Recording.mp4?web=1|Video 1]]  [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Meeting%20in%20_General_-20210916_151538-Meeting%20Recording.mp4?web=1|Video 2]] | +
-|2.|  17.09   09:00-10:45 | Data Understanding | {{ :magistraleinformatica:dmi:2-data_understanding.pdf | Slides DU}} |Chap.2 Kumar Book and additioanl resource of Kumar Book:[[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Exploring Data]] If you have the first ed. of KUMAR this is the Chap 3 | [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210917_071017-Meeting%20Recording.mp4|Video 1]]  [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210917_101809-Meeting%20Recording.mp4|Video 2]] | +
-|3.|  22.09  14:15-16:00 | Data Understanding + Data Preparation        | {{ :magistraleinformatica:dmi:3-data_preparation.pdf |}} | Chap. 2 Kumar Book | [[https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210922_120312-Meeting%20Recording.mp4|Video]] | +
-|4.|  23.09  14:15-16:00 | Data Preparation + Data Similarities.|{{ :magistraleinformatica:dmi:4-data_similarity.pdf |}}       | Data Similarity is in Chap. 2  | +
-|5.|  24.09  09:00-10:45 | Introduction to Clustering. Center-based clustering: kmeans| {{ :magistraleinformatica:dmi:5-basic_cluster_analysis-intro.pdf |}}  {{ :magistraleinformatica:dmi:6.1-basic_cluster_analysis-kmeans.pdf |}}     | Clustering is in Chap. 7  | +
-|6.|  29.09  14:15-16:00 | Hierarchical clustering       | {{ :magistraleinformatica:dmi:7.basic_cluster_analysis-hierarchical.pdf |}} | Chap. 7 Kumar Book |  | +
-|7.|  30.09  14:15-16:00 | Density based clustering. Clustering validity. Lab. DU | {{ :magistraleinformatica:dmi:8.basic_cluster_analysis-dbscan-validity.pdf |}}  {{ :undefined:dataund.zip | Notebook DU tips}} {{ :magistraleinformatica:dmi:adult_du.zip | Another Notebook on DU}}  | Chap. 7 Kumar Book  | +
-|8.|  01.10  09:00-10:45 | Python Lab - Clustering|  {{ :magistraleinformatica:dmi:tips_clustering.ipynb_complete.zip | Notebook CLustering Tips}}     | +
-|9.|  06.10  14:15-16:00 | Center-based clustering: Bisecting K-means, Xmeans, EM       | {{ :magistraleinformatica:dmi:6.2-basic_cluster_analysis-kmeans-variants.pdf |}} | Chap. 7 Kumar Book, {{ :magistraleinformatica:dmi:clusteringmixturemodels.pdf |}} {{ :magistraleinformatica:dmi:xmeans.pdf |}}|  | +
-|10.| 07.10  14:15-16:00 | Classification Problem. Decision Trees  |{{ :magistraleinformatica:dmi:9.chap3_basic_classification-2020.pdf |}} | Chap. 3 Kumar Book| | +
-|   | 08.10  09:00-10:45 | Lecture canceled |       | +
-|11.| 13.10  14:15-16:00 | Decision Trees + Classifier Evaluation | same slides of the previous lecture | Chap. 3 Kumar Book|  | +
-|12.| 14.10  14:15-16:00 | Evaluation Methods for Classification Models  | same slides of the previous lecture | Chap. 3 Kumar Book|  | +
-|13.| 15.10  09:00-10:45 |   Statistical tool for model evaluation + Rule based classification| {{ :magistraleinformatica:dmi:10-rule-based-clussifiers.pdf |}} |  Chap. 3 Kumar Book +  Chap. 4 Kumar Book| |  +
-|14.| 20.10  14:15-16:00 | Rule based classification + Instance-based Classification  | {{ :magistraleinformatica:dmi:10-knn.pdf |}} | Chap. 4 Kumar Book|  | +
-|15.| 21.10  14:15-16:00 | Exercise on DT learning + Naive Bayesian Classifier | {{ :magistraleinformatica:dmi:11_2021-naive_bayes.pdf |}} {{ :magistraleinformatica:dmi:2021-dt-ex.pdf |}} | Chap. 4 Kumar Book|  | +
-|16.| 22.10  09:00-10:45 | SVM & Ensemble Classifiers | {{ :magistraleinformatica:dmi:14_svm_2020.pdf |}} {{ :magistraleinformatica:dmi:13_ensemble_2020.pdf |}} |  Chap. 4 Kumar Book| |  +
-|17.| 27.10  14:15-16:00 | Neural Networks  |  {{ :magistraleinformatica:dmi:15_neural_networks_2021.pdf |}}| Chap. 4 Kumar Book|  | +
-|18.| 28.10  14:15-16:00 | Python Lab on Classification | {{ :magistraleinformatica:dmi:adult_classification_2021.ipynb.zip |}} | |  | +
-|19.| 29.11  09:00-10:45 | Canceled |  |   | |  +
-|20.| 03.11  14:15-16:00 | Python Lab on Classification + Association Rule Mining  | {{ :magistraleinformatica:dmi:classificationpython2.zip |}} {{ :magistraleinformatica:dmi:17_association_analysis2021.pdf |}} | Chap.5 Association Rules: Kumar Book|  | +
-|21.| 04.11  14:15-16:00 | Association Rule Mining |   | |  Chap.5 Association Rules: Kumar Book| +
-|22.| 05.11  09:00-10:45 | FP-Growth - Sequential Pattern Mining | {{ :magistraleinformatica:dmi:17_2021-fp-growth.pdf |}} |   | Chap.6 Kumar Book|  +
-|23.| 10.11  14:15-16:00 | Sequential Pattern Mining |  {{ :magistraleinformatica:dmi:18_sequential_patterns_2021.pdf |}} |Chap.7 Kumar Book |  | +
-|24.| 11.11  14:15-16:00 | Time Series Similarities, Transformations & Clustering | {{ :magistraleinformatica:dmi:22_time_series_similarity_2021.pdf |}}  | [[https://cs.gmu.edu/~jessica/BookChapterTSMining.pdf|Overview on DM for time series]]| +
-|25.| 12.11  09:00-10:45 | Motif & Shapelet Discovery | {{ :magistraleinformatica:dmi:23_time_series_shapelets-motif-2021.pdf |}} | {{ :magistraleinformatica:dmi:matrixprofile.pdf |}}  {{ :magistraleinformatica:dmi:shaplet.pdf |}} | | +
-|24.| 17.11  14:15-16:00 | Lab: Association Rules & Sequential pattern mining by Python |  {{ :magistraleinformatica:dmi:arm-spm.zip |}} |  |  | +
-|25.| 18.11  14:15-16:00 | Ethics & Privacy | {{ :magistraleinformatica:dmi:19_ethics_privacy2021.pdf |}}  | [[https://cs.gmu.edu/~jessica/BookChapterTSMining.pdf|Overview on DM for time series]]|  | +
-|26.| 19.11  09:00-10:45 | Lab: Time series |  | {{ :magistraleinformatica:dmi:timeseries-py.zip |}} | |+
  
-====== Exams ====== +Software material available in the [[https://github.com/data-mining-UniPI/teaching23|Github repository]] (available in the coming days).
-**Mid-term Project **+
  
-A project consists in data analyses based on the use of data mining tools.  
-The project has to be performed by a team of 2/3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The students must deliver both: paper (single column) and  well commented Python Notebooks. 
- 
-  * First part of the project consists in the **assignments** described here: {{ :magistraleinformatica:dmi:data_mining_project_2021_1_.pdf | Project Description}} 
-     * **Dataset:** {{ :magistraleinformatica:dmi:prj_data.zip | Dataset}} 
-     * **Deadline**: the fist part has to be delivered within  November, <del>5th 2021</del> 15th 2021.Send an email to: anna.monreale@unipi.it and francesca.naretto@sns.it  
    
-  * Second part of the project consists in the assignment Task 3 described here: {{ :magistraleinformatica:dmi:data_mining_project_2021-2.pdf |Updated Project Description}} +====== Class Calendar (2024/2025) ======
-  *   * **Deadline**: 5th January 2022+
  
-** Paper Presentation (OPTIONAL)**+===== First Semester  =====
  
-Students need to present a research paper (made available by the teacher) during the last week of the courseThis presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questionsThey only need to present the project (see next point). The paper presentation can be done by the group or by a single person.+^ ^ Day ^ Topic ^ Learning material ^ References ^ Video Lectures ^ Teacher ^ 
 +|    |  17.09  | Candeled    |   | 
 +|1  19.09  | Overview. Introduction to KDD    |Chap1 Kumar Book | | |
  
-**Oral Exam** +   
-  * **Project presentation** (with slides) – 10 minutes: mandatory for all the students +====== Exams ====== 
-  * ** Open questions ** on the entire program: optional only for students opting for paper presentation. +TBD
-  +
- +
  
 +====== Previous years =====
 +[[DM-INF 2023-2024]]
  
-===== Reading About the "Data Scientist" Job =====+[[DM-INF 2022-2023]]
  
-** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the "sexiest" around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them. **+[[DM-INF 2021-2022]]
  
-//Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// 
- 
-  * Data, data everywhere. The Economist, Feb. 2010 {{:dm:economist--010.pdf|download}} 
-  * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http://tech.fortune.cnn.com/2011/09/06/data-scientist-the-hot-new-gig-in-tech/|link]] 
-  * Welcome to the yotta world. The Economist, Sept. 2011 {{:dm:economist-2012-dm.pdf|download}} 
-  * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1|link]] 
-  * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http://www.ilsole24ore.com/art/tecnologie/2012-09-21/futuro-scritto-data-155044.shtml?uuid=AbOQCOhG|link]] 
-  * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}} 
-  * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]] 
- 
-  * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]] 
- 
-====== Previous years ===== 
 [[DM-INF 2020-2021]] [[DM-INF 2020-2021]]
  
magistraleinformatica/dmi/start.1637370073.txt.gz · Ultima modifica: 20/11/2021 alle 01:01 (3 anni fa) da Anna Monreale

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki