Strumenti Utente

Strumenti Sito


dm:mains.santanna.2011-12

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
dm:mains.santanna.2011-12 [30/11/2011 alle 12:54 (13 anni fa)]
Fosca Giannotti [Reading about the data analyst job]
dm:mains.santanna.2011-12 [14/03/2013 alle 15:03 (11 anni fa)] (versione attuale)
Fosca Giannotti [Exercise] Deadline extended
Linea 5: Linea 5:
 ===== News ===== ===== News =====
  
-  * Exercises 1 and 2 are onlineDeadline for both assigments is ** December 13, 2011** Send both reports in .pdf format by email to [[pedre@di.unipi.it]] with the tag [DM-MAINS] in the subject line.+  * The data mining software Weka can be downloaded from [[http://www.cs.waikato.ac.nz/ml/weka/|here]].
  
 ====== Goals ====== ====== Goals ======
Linea 41: Linea 41:
  
 ^ ^ Date ^ Topic ^ Learning material ^  ^ ^ Date ^ Topic ^ Learning material ^ 
-|1.   |22.11.2011 - 11:00-13:00 and 16:00-18:00 | Introduction to Data Mining and the Knowledge Discovery Process | {{:dm:introductiondm.pdf|}} - Textbook: chapt. 1 |   +|1.   |05.03.2013 - 11:00-13:00 | Introduction to Data Mining and the Knowledge Discovery Process | {{:dm:introductiondm.pdf|slides}} - Textbook: chapt. 1 |   
-|2.   |23.11.2011 - 09:00-11:00  | Data understanding. Introduction to Weka | {{:dm:chap2_data.pdf|}} - Textbook: chapt. 2 and 3  |  +|2.   |06.03.2013 - 09:00-13:00  | Data understanding. Introduction to Weka | {{:dm:chap2_data.pdf|slides}} - Textbook: chapt. 2 (2.1, 2.2) and chapt. (3.1, 3.2, 3.3) |  
-|3.   |28.11.2011 11:00-13:00 and 14:00-16:00 | Clustering Analysis | {{:dm:clustering.pdf|}} - Textbook: chapt. 8. |  +|3.   |06.03.2013 - 14:00-18:00  | Clustering Analysis | {{:dm:clustering.pdf|slides}} - Textbook: chapt. 8 (8.1, 8.2, 8.5) |  
-|4.   |29.11.2011 11:00-13:00 and 16:00-18:00 | Classification and predictive analysis | {{:dm:dm.classification.pdf|}} - Textbook: chapt. 4 |  +|4.   |07.03.2013 09:00-13:00 and 14:00-18:00 | Classification and predictive analysis | {{:dm:dm.classification.pdf|slides}} - Textbook: chapt. 4 (4.1, 4.2, 4.3, 4.4, 4.5) 
-|5  |30.11.2011 - 16:00-18:00 | Pattern discovery and associaltion rule mining | Textbook: chapt6 |  +
-|6  |05.12.2011 - 09:00-13:00 | CRM applications. Big data and social network analysis. Data mining and privacy |  +
  
  
-===== Exercises ===== 
  
-  - ** ClusteringRussian Companies dataset. ** Download the zipped .arff dataset at {{:dm:russiancompanies.zip|}}, describing 1438 Russian companies. The following properties of each company are providedrelative to years 1996 and 1997number of employees (emp), total amount of wages (wage), total revenues (output)the logarithm of the three previous variables (resp., ln, lw, ly), the production sector (sector: 1 = industryconstructions,trade), the kind of ownership (owntype: 1 public, 2 private, 3 mixed). Provide a clustering analysis of the dataset with respect to a selected subset of variables, and explain the obtained clusters taking into account also the nominal variables sector and owntypeDescribe your findings in a short report (up to 3 three pages) illustrating the key features of the dataset, how you conducted the clustering analysis, and the interpretation of the obtained clusters.  +===== Exercise ===== 
-  - ** Classification: Adult Census dataset. ** Download the zipped .arff dataset at {{:dm:adult.census.zip|}}, describing demographic information about 32561 persons extracted from US census data. The available attrubutes are: age, workclass, education, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, and a binary class income attribute (> $50K, < = $50K). Provide a concise, accurate and readable decision tree for the classification problem of predicting the income class variable given (all or some ofthe other variables. Describe your findings in a short report (up to three pagesillustrating the key features of the dataset, how you conducted the classification analysis, and the interpretation of the obtained tree.+ 
 +  * **Breast Cancer Wisconsin (Diagnostic) Data Set. Assigned on07.03.2013. To be completed within: 22.03.2013Send papers (3 pages max of text, figures excluded) by email to [[pedre@di.unipi.it]] cc: Fosca Giannotti[[fosca.giannotti@gmail.com]]. Use "[DM-MAINS] " in the subject. Groupwork allowed, max 3 people per group, inter-disciplinary competence required in each group!**  
 +  * **Instructions:** Download the {{http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29|Wisconsin Diagnostic Breast Cancer (WDBC) dataset}} from the UCI archive. The dataset contains 569 observations on samples of breast tissuetogether with their classification as benign or malignant, as performed by istologists. You are supposed to perform the following tasks1) Data understanding and exploratory analysis; 2) clustering analysis (disregarding the class information), including description of the discovered (bestclusters; 3classification analysis using decision trees for the task of diagnosing a sample as benign or malignantDescribe the process adopted to select the proposed clustering/treetogether with their quality evaluation. 
 +====== Exams ====== 
 + 
 +The exam of the Data Mining module consists in the evaluation of the report of assigned exercisesFor students of the two-year LM-MAINS degree the exam consists in the evaluation of the report of exercises, and an individual oral exam devoted to the discussion of aspects emerging from the exercises. The evaluation of the reports is the same for all components of the group (max students oer group). The date of the first oral exam session of the LM-MAINS students will set by appointment. 
 + 
 +====== 2012 Edition ======  
 + 
 +[[Edizione2012|ICT for BI & CRM - Part III: Data Mining 2012]] 
 + 
dm/mains.santanna.2011-12.1322657646.txt.gz · Ultima modifica: 30/11/2011 alle 12:54 (13 anni fa) da Fosca Giannotti