Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

--- dm:mains.santanna.2011-12 [30/11/2011 alle 12:54 (13 anni fa)]
Fosca Giannotti [Reading about the data analyst job]
+++ dm:mains.santanna.2011-12 [14/03/2013 alle 15:03 (11 anni fa)] (versione attuale)
Fosca Giannotti [Exercise] Deadline extended
@@ Linea 5: / Linea 5: @@
 ===== News =====
-  * Exercises 1 and 2 are online. Deadline for both assigments is ** December 13, 2011. ** Send both reports in .pdf format by email to [[pedre@di.unipi.it]] with the tag [DM-MAINS] in the subject line.
+  * The data mining software Weka can be downloaded from [[http://www.cs.waikato.ac.nz/ml/weka/|here]].
 ====== Goals ======
@@ Linea 41: / Linea 41: @@
 ^ ^ Date ^ Topic ^ Learning material ^
-|1.   |22.11.2011 - 11:00-13:00 and 16:00-18:00 | Introduction to Data Mining and the Knowledge Discovery Process | {{:dm:introductiondm.pdf|}} - Textbook: chapt. 1 |
+|1.   |05.03.2013 - 11:00-13:00 | Introduction to Data Mining and the Knowledge Discovery Process | {{:dm:introductiondm.pdf|slides}} - Textbook: chapt. 1 |
-|2.   |23.11.2011 - 09:00-11:00  | Data understanding. Introduction to Weka | {{:dm:chap2_data.pdf|}} - Textbook: chapt. 2 and 3  |
+|2.   |06.03.2013 - 09:00-13:00  | Data understanding. Introduction to Weka | {{:dm:chap2_data.pdf|slides}} - Textbook: chapt. 2 (2.1, 2.2) and chapt. 3 (3.1, 3.2, 3.3) |
-|3.   |28.11.2011 - 11:00-13:00 and 14:00-16:00 | Clustering Analysis | {{:dm:clustering.pdf|}} - Textbook: chapt. 8. |
+|3.   |06.03.2013 - 14:00-18:00  | Clustering Analysis | {{:dm:clustering.pdf|slides}} - Textbook: chapt. 8 (8.1, 8.2, 8.5) |
-|4.   |29.11.2011 - 11:00-13:00 and 16:00-18:00 | Classification and predictive analysis | {{:dm:dm.classification.pdf|}} - Textbook: chapt. 4 |
+|4.   |07.03.2013 - 09:00-13:00 and 14:00-18:00 | Classification and predictive analysis | {{:dm:dm.classification.pdf|slides}} - Textbook: chapt. 4 (4.1, 4.2, 4.3, 4.4, 4.5) |
-|5.   |30.11.2011 - 16:00-18:00 | Pattern discovery and associaltion rule mining | Textbook: chapt. 6 |
-|6.   |05.12.2011 - 09:00-13:00 | CRM applications. Big data and social network analysis. Data mining and privacy |  |
-===== Exercises =====
-  - ** Clustering: Russian Companies dataset. ** Download the zipped .arff dataset at {{:dm:russiancompanies.zip|}}, describing 1438 Russian companies. The following properties of each company are provided, relative to years 1996 and 1997: number of employees (emp), total amount of wages (wage), total revenues (output), the logarithm of the three previous variables (resp., ln, lw, ly), the production sector (sector: 1 = industry, 2 = constructions, 3 = trade), the kind of ownership (owntype: 1 = public, 2 = private, 3 = mixed). Provide a clustering analysis of the dataset with respect to a selected subset of variables, and explain the obtained clusters taking into account also the nominal variables sector and owntype. Describe your findings in a short report (up to 3 three pages) illustrating the key features of the dataset, how you conducted the clustering analysis, and the interpretation of the obtained clusters.
+===== Exercise =====
-  - ** Classification: Adult Census dataset. ** Download the zipped .arff dataset at {{:dm:adult.census.zip|}}, describing demographic information about 32561 persons extracted from US census data. The available attrubutes are: age, workclass, education, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, and a binary class income attribute (> $50K, < = $50K). Provide a concise, accurate and readable decision tree for the classification problem of predicting the income class variable given (all or some of) the other variables. Describe your findings in a short report (up to 3 three pages) illustrating the key features of the dataset, how you conducted the classification analysis, and the interpretation of the obtained tree.
+  * **Breast Cancer Wisconsin (Diagnostic) Data Set. Assigned on: 07.03.2013. To be completed within: 22.03.2013. Send papers (3 pages max of text, figures excluded) by email to [[pedre@di.unipi.it]] cc: Fosca Giannotti[[fosca.giannotti@gmail.com]]. Use "[DM-MAINS] " in the subject. Groupwork allowed, max 3 people per group, inter-disciplinary competence required in each group!**
+  * **Instructions:** Download the {{http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29|Wisconsin Diagnostic Breast Cancer (WDBC) dataset}} from the UCI archive. The dataset contains 569 observations on samples of breast tissue, together with their classification as benign or malignant, as performed by istologists. You are supposed to perform the following tasks: 1) Data understanding and exploratory analysis; 2) clustering analysis (disregarding the class information), including description of the discovered (best) clusters; 3) classification analysis using decision trees for the task of diagnosing a sample as benign or malignant. Describe the process adopted to select the proposed clustering/tree, together with their quality evaluation.
+====== Exams ======
+The exam of the Data Mining module consists in the evaluation of the report of assigned exercises. For students of the two-year LM-MAINS degree the exam consists in the evaluation of the report of exercises, and an individual oral exam devoted to the discussion of aspects emerging from the exercises. The evaluation of the reports is the same for all components of the group (max 3 students oer group). The date of the first oral exam session of the LM-MAINS students will set by appointment.
+====== 2012 Edition ======
+[[Edizione2012|ICT for BI & CRM - Part III: Data Mining 2012]]

DidaWiki

Strumenti Utente

Strumenti Sito

Differenze

Strumenti Pagina