Organizations and business are overwhelmed by the flood of data continuously collected into their data warehouses and arriving from external sources – the Web above all. Traditional exploratory techniques may fail to make sense of the data, due to its inherent complexity and size. Data mining and knowledge discovery techniques emerged as an alternative approach, aimed at revealing patterns, rules and models hidden in the data, and at supporting the analytical user to develop descriptive and predictive models for a number of business problems. This short course focusses on the main applications scenarios of data mining to challenging problems in the broad CRM domain - Customer Relationship Management.
|01.||11.05.2016 - 09:00-13:00||Introduction to data mining and big data analytics||slides: intro slides: case studies||Giannotti|
|02.||11.05.2016 - 14:00-18:00||Data understanding; data preparation; Knime tutorial||slides slides data understanding Tutorial KnimeKnime su Iris||Pedreschi, Monreale|
|03.||12.05.2016 - 09:00-13:00||Pattern and association rule mining & market basket analysis||Giannotti|
|04.||12.05.2016 - 14:00-18:00||Pattern and association rule mining: esercizi con Knime||Giannotti, Monreale|
|05.||13.05.2016 - 09:00-13:00||Clustering analysis & customer segmentation||slides clustering slides customer segmentation||Pedreschi|
|06.||13.05.2016 - 14:00-18:00||Clustering analysis: esercizi con Knime||Pedreschi, Monreale|
|07.||16.05.2016 - 09:00-13:00||Classification & prediction||slides classification||Pedreschi|
|08.||16.05.2016 - 14:00-18:00||Prediction models for promotion performance and churn analysis||slides||Giannotti|
|09.||18.05.2016 - 09:00-13:00||Classification & prediction: esercizi con Knime||Pedreschi, Monreale|
|10.||18.05.2016 - 14:00-18:00||Social network analysis: fundamentals||slides||Pedreschi|
|11.||20.05.2016 - 09:00-13:00||Mobility data mining & big data analytics||Giannotti|
|12.||20.05.2016 - 14:00-18:00||Big Data Analytics: Privacy awareness||Giannotti, Monreale|
DSB-Churn Dataset: The dataset consists of 20,000 examples (lines, rows) over 12 variables (fields, columns) describing features of customers of a mobile phone provider, including the class variable LEAVE representing whether e customer decided to quit the company or not. The class variable, LEAVE, is the last variable on each line, and its legal values are LEAVE and STAY. The header of churn.arff describes the legal values of each variable. Informally, in the following we list their meanings:
COLLEGE : Is the customer college educated?
INCOME: Annual income
OVERAGE: Average overcharges per month
LEFTOVER: Average % leftover minutes per month
HOUSE: Value of dwelling (from census tract)
HANDSET_PRICE: Cost of phone
OVER_15MINS_CALLS_PER_MONTH: Average number of long (>15 mins) calls per month
AVERAGE_CALL_DURATION: Average call duration
REPORTED_SATISFACTION: Reported level of satisfaction
REPORTED_USAGE_LEVEL: Self-reported usage level
CONSIDERING_CHANGE_OF_PLAN: Was customer considering changing his/her plan?
LEAVE : Class variable: whether customer left or stayed
The dataset is available here.
Each group (2-3 people) is required to deliver a report (max 10 pages including all figures) describing the methods adopted and the discussion of achieved results with reference to the tasks listed below. Assume that the report is targeted to a marketing strategist, who is interested to learn the story inferred in the various data mining analyses and to receive suggestions on how to take appropriate actions as a consequence.
1. Data Understanding: useful as a preliminary step to capture basic data property. Distribution analysis, statistical exploration, correlation analysis, suitable transformation of variables and elimination of redundant variables, management of missing values.
2. Market Basket Analysis. Problem: prepare data and extract interesting association rules and frequent patterns. The report should discuss the parameters used for the analyses, justifying your findings related to the most interesting rules according to the different measure introduced in the course.
3. Customer segmentation with k-means. Problem: find a high-quality clustering using K-means and discuss the profile of each found cluster (in terms of the properties that describe the properties of the customers of each cluster). The report should illustrate the adopted clustering methodology and the cluster interpretation. In particular, it is necessary to discuss the identification of the best value of k and the characterisation of the obtained clusters by using both analysis of the k centroids and comparison of the statistics of variables within the clusters with that in the whole dataset.
4. Churn analysis with decision trees. Problem: find a high-quality decision tree that predicts whether each customer will STAY or LEAVE. The report should illustrate the adopted classification methodology and the decision tree validation and interpretation, describing also the process adopted to select the proposed tree, together with its quality evaluation.
Deadline: send the report by email to all instructors within 1 July 2015. Specify [MAINS] in the subject of the email.
The exam consists in the evaluation of the report of the proposed mining exercises.