Strumenti Utente

Strumenti Sito


dm:mains.santanna.dm4crm.2017

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
dm:mains.santanna.dm4crm.2017 [07/03/2017 alle 18:49 (7 anni fa)]
Fosca Giannotti [Exercises]
dm:mains.santanna.dm4crm.2017 [23/05/2017 alle 15:30 (7 anni fa)] (versione attuale)
Anna Monreale [Exercises]
Linea 1: Linea 1:
 ====== Data Mining for Customer Relationship Management 2017 ====== ====== Data Mining for Customer Relationship Management 2017 ======
  
-  * **Fosca Giannotti** ISTI-CNR, Knowledge Discovery and Data Mining Lab [[fosca.giannotti@isti.cnr.it]]+  * **Fosca Giannotti**\\ ISTI-CNR, Knowledge Discovery and Data Mining Lab\\ [[fosca.giannotti@isti.cnr.it]]
  
-  * **Dino Pedreschi** Università di Pisa, Knowledge Discovery and Data Mining Lab [[pedre@di.unipi.it]]+  * **Dino Pedreschi**\\ Università di Pisa, Knowledge Discovery and Data Mining Lab\\ [[pedre@di.unipi.it]]
  
-  * Assistente: **Riccardo Guidotti**ISTI-CNR, Knowledge Discovery and Data Mining Lab [[annam@di.unipi.it]]+  * Teaching Assistant: **Riccardo Guidotti**\\ ISTI-CNR, Knowledge Discovery and Data Mining Lab\\ [[riccardo.guidotti@isti.cnr.it]]
  
 ===== News ===== ===== News =====
Linea 43: Linea 43:
  
 ^ ^ Date ^ Topic ^ Learning material ^Instructor ^  ^ ^ Date ^ Topic ^ Learning material ^Instructor ^ 
-|01.   | 16.05.2017 - 09:00-13:00  | Introduction to data mining and big data analytics | {{:dm:1.dm_ml_introduction.pdf| slides: intro}} {{:dm:2.dm_ml-casestudies.ppt.pdf| slides: case studies}} | Giannotti +|01.   | 16.05.2017 - 09:00-13:00  | Introduction to data mining and big data analytics | {{:dm:1.dm_ml_introduction.pdf| slides: intro}} {{:dm:2.dm_ml-casestudies.ppt.pdf| slides: case studies}} | Pedreschi 
-|02.   | 16.05.2017 - 14:00-18:00  | Data understanding; data preparation; Knime tutorial | {{:dm:4.dm_ml_data_preparation.pdf| slides}} {{:dm:04_dataunderstanding.pdf| slides data understanding}} {{:dm:knime_slides_mains.pdf| Tutorial Knime}}{{:dm:du-iris.zip|Knime su Iris}} | Pedreschi, Guidotti | +|02.   | 16.05.2017 - 14:00-18:00  | Data understanding; data preparation; Knime tutorial | {{:dm:4.dm_ml_data_preparation.pdf| slides}} {{:dm:04_dataunderstanding.pdf| slides data understanding}} {{:dm:knime_slides_mains.pdf| Tutorial Knime}} {{ :dm:01_titanic_data_understanding.zip | 01_titanic_data_understanding}} | Pedreschi, Guidotti | 
-|03.   | 17.05.2017 - 09:00-13:00 Pattern and association rule mining & market basket analysis | {{:dm:3.dm-ml_patternmining.pdf|PatternMining-AR}}| Giannotti +|03.   | 17.05.2017 - 09:00-13:00 Clustering analysis & customer segmentation | {{:dm:dm.pedreschi.clustering.2015.pdf| slides clustering}} {{:dm:customersegmentation.pdf| slides customer segmentation}} | Pedreschi 
-|04.   | 17.05.2017 - 14:00-18:00 Pattern and association rule mining: esercizi con Knime | {{:dm:relation_to_transactional_pima_new.zip|Knime-AR}}| Giannotti, Guidotti |  +|04.   | 17.05.2017 - 14:00-18:00 Clustering analysis: esercizi con Knime  | {{ :dm:02_titanic_clustering.zip | 02_titanic_clustering}} | Pedreschi, Giannotti, Guidotti |  
-|05.   | 18.05.2017 - 09:00-13:00 Clustering analysis & customer segmentation | {{:dm:dm.pedreschi.clustering.2015.pdf| slides clustering}} {{:dm:customersegmentation.pdf| slides customer segmentation}} | Pedreschi +|05.   | 18.05.2017 - 09:00-13:00 Pattern and association rule mining & market basket analysis | {{:dm:3.dm-ml_patternmining.pdf|PatternMining-AR}} | Giannotti 
-|06.   | 18.05.2017 - 14:00-18:00 Clustering analysis: esercizi con Knime | | Pedreschi, Guidotti | +|06.   | 18.05.2017 - 14:00-18:00 Pattern and association rule mining: esercizi con Knime |{{ :dm:03_titanic_pattern.zip 03_titanic_pattern}} {{ :dm:04_coop_pattern.zip | 04_coop_pattern}} | Giannotti, Guidotti | 
-|07.   | 19.05.2017 - 09:00-13:00  | Classification & prediction | {{:dm:dm.giannotti.pedreschi.classification.2015.pdf| slides classification}} [[http://www.r2d3.us/visual-intro-to-machine-learning-part-1/|Visual Introduction to Classification with Decision Trees]] | Giannotti, Guidotti | +|07.   | 19.05.2017 - 09:00-13:00  | Classification & prediction | {{:dm:dm.giannotti.pedreschi.classification.2015.pdf| slides classification}} [[http://www.r2d3.us/visual-intro-to-machine-learning-part-1/|Visual Introduction to Classification with Decision Trees]] | Giannotti, Pedreschi, Guidotti | 
-|08.   | 19.05.2017 - 14:00-18:00  | Prediction models for promotion performance and churn analysis | {{:dm:5.dml-ml-crm-redemption-churn-promozioni-profili-innovatori.pptx.pdf| slides}} {{:dm:crm_dm-survey.pdf|Survey of DM applications in CRM}} {{:dm:change-customer-behavior.pdf|Mining changes in customer behavior in retail marketing}} | Giannotti +|08.   | 19.05.2017 - 14:00-18:00  | Classification & prediction: esercizi con Knime | {{ :dm:05_titanic_classification.zip | 05_titanic_classification}} | Pedreschi | 
-|09.   | 22.05.2017 - 09:00-13:00  | Classification & prediction: esercizi con Knime | | Pedreschi, Guidotti +|09.   | 22.05.2017 - 09:00-13:00  | Social network analysis: fundamentals | {{:dm:pedreschi_sna_crash_course_mains.pptx.pdf| slides}} | Pedreschi | 
-|10.   | 22.05.2017 - 14:00-18:00  | Social network analysis: fundamentals | {{:dm:pedreschi_sna_crash_course_mains.pptx.pdf| slides}} | Pedreschi |+|10.   | 22.05.2017 - 14:00-18:00  | Prediction models for promotion performance and churn analysis | {{:dm:5.dml-ml-crm-redemption-churn-promozioni-profili-innovatori.pptx.pdf| slides}} {{:dm:crm_dm-survey.pdf|Survey of DM applications in CRM}} {{:dm:change-customer-behavior.pdf|Mining changes in customer behavior in retail marketing}} | Giannotti, Guidotti |
 |11.   | 23.05.2017 - 09:00-13:00  | Mobility data mining & big data analytics | | Giannotti | |11.   | 23.05.2017 - 09:00-13:00  | Mobility data mining & big data analytics | | Giannotti |
-|12.   | 23.05.2017 - 14:00-18:00  | Big Data Analytics: Privacy awareness | {{:dm:privacy-intro.pdf|Slides Privacy}}| Giannotti, Guidotti |+|12.   | 23.05.2017 - 14:00-18:00  | Big Data Analytics: Privacy awareness | {{:dm:privacy-intro.pdf|Slides Privacy}} {{ :dm:06_class_mobility_mining.zip |}}| Giannotti, Guidotti |
 ===== Datasets ===== ===== Datasets =====
  
  
-0. Iris dataset. {{:dm:data.txt.zip|Iris}} +0. {{ :dm:data.txt.zip | Iris}}. (for details see [[https://archive.ics.uci.edu/ml/datasets/iris]])
  
-1. Shuttle dataset.{{:dm:shuttle.data.zip| Shuttle }}+1. {{ :dm:human_resources.csv.zip | Human Resources}}. (for details see [[https://www.kaggle.com/ludobenistant/hr-analytics]])
  
-2. Pima Indians Diabetes. [[http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes|Dataset]]+2. {{ :dm:telco_churn.csv.zip | Telco Churn}}(for details see [[http://didawiki.di.unipi.it/doku.php/dm/mains.santanna.dm4crm.2016]])
  
-===== Exercises =====+3. {{ :dm:adult.csv.zip | Adult}}. (for details see [[https://archive.ics.uci.edu/ml/datasets/Adult]])
  
-** DSB-Churn Dataset: ** The dataset consists of 20,000 examples (lines, rows) over 12 variables (fields, columns) describing features of customers of a mobile phone provider, including the class variable LEAVE representing whether e customer decided to quit the company or notThe class variable, LEAVE, is the last variable on each line, and its legal values are LEAVE and STAY.  The header of churn.arff describes the legal values of each variable.  Informally, in the following we list their meanings: +4. {{ :dm:titanic_train.csv.zip | Titanic}}. (for details see [[https://www.kaggle.com/c/titanic]])
- +
-COLLEGE : Is the customer college educated? +
- +
-INCOME: Annual income +
- +
-OVERAGE: Average overcharges per month +
- +
-LEFTOVER: Average % leftover minutes per month +
- +
-HOUSE: Value of dwelling (from census tract) +
- +
-HANDSET_PRICE: Cost of phone +
- +
-OVER_15MINS_CALLS_PER_MONTH: Average number of long (>15 mins) calls per month +
- +
-AVERAGE_CALL_DURATION: Average call duration +
- +
-REPORTED_SATISFACTION: Reported level of satisfaction +
- +
-REPORTED_USAGE_LEVEL: Self-reported usage level +
- +
-CONSIDERING_CHANGE_OF_PLAN: Was customer considering changing his/her plan? +
- +
-LEAVE : Class variable: whether customer left or stayed +
- +
- +
-**The dataset is available {{:dm:churn.arff.zip|here}}.**+
  
 +===== Exercises =====
 **Guidelines:** **Guidelines:**
  
-Each group (2-3 people) is required to deliver a report (max 10 pages including all figures) describing the methods adopted and the discussion of achieved results with reference to the tasks listed below. Assume that the report is targeted to a //marketing strategist//, who is interested to learn the story inferred in the various data mining analyses and to receive suggestions on how to take appropriate actions as a consequence.+Each group (2-3 people) is required to deliver a report (max 20 pages including all figures) describing the methods adopted and the discussion of the most interesting achieved results with reference to the tasks listed below. Assume that the report is targeted to a //marketing strategist//, who is interested to learn the story inferred in the various data mining analyses and to receive suggestions on how to take appropriate actions as a consequence.
  
 **1. Data Understanding**: useful as a preliminary step to capture basic data property. Distribution analysis, statistical exploration, correlation analysis, suitable transformation of variables and elimination of redundant variables, management of missing values. **1. Data Understanding**: useful as a preliminary step to capture basic data property. Distribution analysis, statistical exploration, correlation analysis, suitable transformation of variables and elimination of redundant variables, management of missing values.
  
-**2. Market Basket Analysis. ** Problem: prepare data and extract interesting association rules and frequent patterns.  The report should discuss the parameters used for the analyses, justifying your findings related to the most interesting rules according to the different measure introduced in the course. +**2. Pattern Mining Analysis. ** Problem: prepare data and extract interesting association rules and frequent patterns.  The report should discuss the parameters used for the analyses, justifying your findings related to the most interesting rules according to the different measure introduced in the course.
- +
-**3. Customer segmentation with k-means.** Problem: find a high-quality clustering using K-means and discuss the profile of each found cluster (in terms of the properties that describe the properties of the customers of each cluster). The report should illustrate the adopted clustering methodology and the cluster interpretation. In particular, it is necessary to discuss the identification of the best value of k and the characterisation of the obtained clusters by using both analysis of the k centroids and comparison of the statistics of variables within the clusters with that in the whole dataset. +
  
-**4Churn analysis with decision trees. ** Problem: find a high-quality decision tree that predicts whether each customer will STAY or LEAVE. The report should  illustrate the adopted classification methodology and the decision tree validation and interpretation, describing also the process adopted to select the proposed tree, together with its quality evaluation.+**3Customer Segmentation. ** Problem: find a high-quality clustering using clustering algorithms and discuss the profile of each found cluster (in terms of the properties that describe the properties of the customers of each cluster). The report should illustrate the adopted clustering methodology and the cluster interpretation. In particularin case of k-means, it is necessary to discuss the identification of the best value of k and the characterisation of the obtained clusters by using both analysis of the k centroids and comparison of the statistics of variables within the clusters with that in the whole dataset.
  
 +**4. Classification Analysis. ** Problem: find a high-quality decision tree for predicting a feature of a customer. The report should  illustrate the adopted classification methodology and the decision tree validation and interpretation, describing also the process adopted to select the proposed tree, together with its quality evaluation.
  
-**Deadline**: send the report by email to all instructors within **4 July 2017**. Specify [MAINS] in the subject of the email. +**Deadline**: send the report by email to all instructors within **23 June 2017**. Specify [MAINS] in the subject of the email. 
 ====== Exams ====== ====== Exams ======
  
dm/mains.santanna.dm4crm.2017.1488912591.txt.gz · Ultima modifica: 07/03/2017 alle 18:49 (7 anni fa) da Fosca Giannotti