====== Web Mining and Social Network Analysis 2012/2013 ====== * **Dino Pedreschi** Università di Pisa, Knowledge Discovery and Data Mining Lab [[pedre@di.unipi.it]] ===== News ===== * **Exam dates: 10 June h 14:00 - 26 June h 09:00 - 11 July h 09.00.** In these dates a calendar of oral examinations will be set. Exams will take place in Pedreschi's office. * **Monday 27 May 2013, Sala Seminari Ovest - [[PhD workshop 2013]]: talks from PhD students attending the course** * New schedule: Mondays 16:00-18:00 Aula C - Thursdays 11:00-13:00 Aula N1 * We need to reschedule the Friday 9:00-11:00 class. Please select your preferences at the following [[http://www.doodle.com/p9un3uxwueim7rdb|Doodle poll]]. Deadline: Monday, March 4, 2013, before the 16:00 lecture. * First lecture of 2013 edition: Friday, March 1st, 2013, h 9:00-11:00 aula N1 ===== 2013 Schedule ===== * **Monday, h 16:00 - 18:00, Aula C** * **Thursday, h 11:00 - 13:00, Aula N1** ====== Goals ====== Over the past decade there has been a growing public fascination with the complex "connectedness" of modern society. This connectedness is found in many contexts: in the rapid growth of the Internet and the Web, in the ease with which global communication now takes place, and in the ability of news and information as well as epidemics and financial crises to spread around the world with surprising speed and intensity. These are phenomena that involve networks and the aggregate behavior of groups of people; they are based on the links that connect us and the ways in which each of our decisions can have subtle consequences for the outcomes of everyone else. This short course is an introduction to the analysis of complex networks, with a special focus on social networks and the Web - its structure and function, and how it can be exploited to search for information. Drawing on ideas from computing and information science, applied mathematics, economics and sociology, the course describes the emerging field of study that is growing at the interface of all these areas, addressing fundamental questions about how the social, economic, and technological worlds are connected. ====== Syllabus ====== 1) Graph theory and social networks * Graphs * Social, information, biological and technological networks * Strong and weak ties * Networks in their surrounding context 2) The World Wide Web * The structure of the Web * Link analysis and Web search * Web mining e sponsored search markets 3) Network dynamics * Information cascades * Power laws and rich-get-richer phenomena * The small-world phenomenon * Epidemics ====== Textbooks and materials ====== * Slides (see Calendar). * **David Easley, Jon Kleinberg: Networks, Crowds, and Markets. [[http://www.cs.cornell.edu/home/kleinber/networks-book/]]** * **Albert-Laszlo Barabasi. Network Science Book Project (2013, ongoing) [[http://barabasilab.neu.edu/networksciencebook/]]** Reading: * **M. E. J. Newman: The structure and function of complex networks**, SIAM Review, Vol. 45, p. 167-256, 2003. ({{:wma:newman_2003.pdf|download pdf}}) * **A.-L. Barabasi. Linked. PLUME, Penguin Group, 2002.** * Duncan J. Watts. //Six Degrees: The Science of a Connected Age.// Norton, New York, 2003. * Anand Rajaraman, Jeffrey D. Ullman, Mining of Massive Datasets. [[http://infolab.stanford.edu/~ullman/pub/book.pdf]] Course on **Network Science** held by **Albert-Laszlo Barabasi** at Northeastern University, Boston, MA: [[http://barabasilab.neu.edu/courses/phys5116/|link]] ====== Midterm Project ====== * Form a group of 3\4 individuals * Choose a network from those listed in [[http://www.giuliorossetti.net/about/ongoing-works/datasets/|link]] * Choose a second network from those listed in [[http://www-personal.umich.edu/~mejn/netdata/|link]] * Send a mail to pedre@di.unipi.it, giulio.rossetti@isti.cnr.it, lpappalardo@di.unipi.it and specify the students composing the group and the networks you have chosen (with subject [WMR], deadline **18 Apr 2013**) . * Prepare a pdf document of **max 3 pages** (figures excluded) in which you present your analysis of the networks. * Send the document produced by mail to pedre@di.unipi.it, giulio.rossetti@isti.cnr.it, lpappalardo@di.unipi.it (with subject [WMR-MidTerm]). **Deadline 5 May 2013** * Mid Term results: {{:wma:wmr13_midterm_results.pdf|download pdf}} ====== Project ====== The project consists of two parts: * **Data Collection**, in which students reconstruct their individual multidimensional networks. Guideline for Data Collection can be found here: {{:wma:datacollection.pdf|Data Collection guideline}} * End of the data collection: **12 May 2013** * Final Term: {{:wma:finaltermwmr.pdf|Description}}, Data and Code can be found at: [[http://www.giuliorossetti.net/about/ongoing-works/material/|link]] * Oral exams: **10 June**, **26 June**, **11 July** (the Final Term survey **must** be submitted at least 3 days before the chosen date). ====== Calendar ====== ^ ^ Date ^ Topic ^ Learning material ^ Homework ^ |1. | Monday, 18.02.2013 | Introduction to Complex Network Analysis. | {{:wma:pedreschi.wmr.2012.01.pdf|slides}} | Read Chapter 1, 2 of Kleinberg's book. | |2. | Monday, 25.02.2013 | Lecture canceled (political elections) | | | |3. | Friday, 01.03.2013 | Introduction to Complex Network Analysis. | | | |4. | Monday, 04.03.2013 | From graph theory to complex network analysis | {{:wma:pedreschi.wmr.2012.02.pdf|slides}} | Read Chapters 1 and 2 of Barabasi's book | |5. | Friday, 08.03.2013 | Basic network measures: degree, distance, clustering | | | | | Monday, 11.03.2013 | Organization of project 1: social network construction | | | | | Thursday, 14.03.2013 | Lecture canceled | | | |6. | Monday, 18.03.2013 | Random networks vs. real networks | {{:wma:pedreschi.wmr.2012.03.pdf|slides}} | Read Chapter 3 of Barabasi's book | |7. | Thursday, 21.03.2013 | Small world, Strength of weak ties | {{:wma:pedreschi.wmr.2012.04.pdf|slides}} | **Reading**: Chapter 3 of Kleinberg's book, {{:wma:travers69smallworld.pdf|Milgram's small world experiment}}, {{:wma:watts-smallworld2003.pdf|Watts' email experiment}}, {{:wma:leskovec-im.pdf|Leskovec's IM experiment}}, {{:wma:granstrengthweakties.pdf|Granovetter's Strength of Weak Ties theory}}, {{:wma:pnas-2007-onnela-7332-6.pdf|Onnela et al.'s Strength of Weak Ties experiment}} | |8. | Monday, 25.03.2013 | Centrality measures | {{:wma:pedreschi.wmr.2012.05.centrality.pdf|slides}} | | |9. | Thursday, 28.03.2013 | Scale free networks | | Read Chapter 4 of Barabasi's book. [[http://barabasilab.neu.edu/courses/phys5116/|Barabasi Class 4]] | | | Friday 29.03.2013 - Friday 05.04.2013 | Easter break + mid-term exams break | | | |10. | Monday, 08.04.2013 | Network analytics with Cytoscape | | [[http://www.cytoscape.org/|Cytoscape website]] | |11. | Thursday, 11.04.2013 | Generative models: Small World model and Barabasi-Albert model (Preferential attachment) | [[http://barabasilab.neu.edu/courses/phys5116/content/Class5_NetSci_2012/05_CLASS_2012_The_Small_World.pdf|slides Barabasi]] [[http://barabasilab.neu.edu/courses/phys5116/content/Class7_NetSci_2012/07_CLASS_2012_BAmodel.pdf|slides Barabasi]] | Read original papers of {{:wma:wsmodel.pdf| Watts-Strogatz model}} and {{:wma:bamodel.pdf| Barabasi-Albert model}} | |12. | | Second assignment | | Network analysis with Cytoscape [[http://www.giuliorossetti.net/about/ongoing-works/datasets/|Datasets]] | |13. | | Dunbar's number | {{:wma:lezionedunbar16032012.pdf|slides}} | Guest lecturer: Luca Pappalardo (Dottorato di Ricerca in Informatica, Università di Pisa) | |14. | | | | Read Chapter 5 and 6 of Barabasi's book and Chapter 18 and 20 of Kleinberg's book. | |15. | | Diffusion, spreading, contagion, epidemics | {{:wma:07-cascading_leskovec_.pdf|slides Leskovec}}, {{:wma:08-cascades_leskovec_.pdf|slides Leskovec}} | Read Chapter 16 and 17 of Kleinberg's book. | |16. | | Student presentations (exercise 2, network analysis) | | |17. | | Link prediction | {{:wma:lezionerossettilinkprediction09.05.2012.pdf|slides}} | Guest lecturer: Giulio Rossetti (Dottorato di Ricerca in Informatica, Università di Pisa) | |18. | | Diffusion, spreading, contagion, epidemics (2) | {{:wma:09-contagion.pdf|slides Leskovec}} [[http://barabasilab.neu.edu/courses/phys5116/content/Class17_NetSci_2012/17_CLASS_2012_Spreading.ppt|slides Barabasi]] | Read Chapter 19 and 21 of Kleinberg's book. | |19. | | **Aula 28 del CNR di Pisa, dalle 14:30 alle 18:30** | **Lezione straordinaria** nell'ambito del workshop: "Tecnologie linguistiche: un nuovo rinascimento?". | **Ore 14:30-16:30 - la rete sociale dell'Inferno di Dante (Pedreschi, Tavoni et al.)** ed intervento dell'artista **Elisabetta Salvatori (Lettura del Canto V dell'Inferno)**. **Ore 16:30-18:30: {{:wma:homophily.pdf|Omofilia ed assortatività delle reti}}, guest lecturer Letterio Galletta** (Dottorato di Ricerca in Informatica, Università di Pisa) | |20. | | Community Discovery. Bow-tie structure of the Web and link analysis (PageRank and HITS, hints) | {{:wma:14-weakties.pdf|slides}} {{:wma:13-pagerank.pdf|slides}} | Read Chapters 13 and 14 of Kleinberg's book. Read {{:wma:communitydiscoverysurvey.pdf|A Classification for Community Discovery Methods in Complex Networks}}, by Michele Coscia, Fosca Giannotti and Dino Pedreschi (SNAM Journal, 2012) | |21. | |**Guest lecture: aula C1**| Human mobility and social ties | Guest lecturer: Paolo Cintia (Dottorato di Ricerca in Informatica, Università di Pisa) | |22. | |**h 9:00 - Aula Seminari Ovest**| **Student seminars - final project** | ** Joint analysis of the social network of this class built by combining each student's ego network on Facebook.** | ====== Link alle edizioni precedenti ====== * Edizione 2011-2012 [[WMA20112012]] * Edizione 2010-2011 [[WMA20102011]] * Edizione 2008-2009 [[WMA20082009]]