====== Year 2015-2016====== Prof. Roberto Grossi\\ Tutor: Dr. Filippo Geraci ==== Announcements ==== * Results and scheduling of the oral examinations in the following days: Tue, Jan 12, 2016, at 9:00 in my office. * New exercises (Dec. 15) added to the {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|partial list}} of problems. * Next terms: Jan. 20 at 9:00 in room L1; Feb. 10 at 9:00 in room L1. * Office hours: Tue 14-16 (Dipartimento di Informatica) ==== Overview ==== We will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. This course deepens and extends the algorithmic notions of students. The syllabus is structured to highlight the applicative scenarios in which the studied algorithms and data structures can be successfully applied. The level of detail with which each argument will be dealt with can change year-by-year, and will be decided according to requests coming from other courses and/or specific issues arising in, possibly novel, applicative scenarios. The advanced nature of this course focuses on developing algorithmic design skills. The students are exposed to complex problems that require a significant effort in problem solving. One "brainstorming" lecture per week is devoted to a full immersion in this activity, and it is highly recommend to attend it. The purpose is not that of teaching/learning further N algorithms, but to refine students' skills. The final written exam will be based on the topics discussed during the "brainstorming" lectures. We like students that make mistakes in a constructive way, since this is the starting point of our brainstorming to drive their reasoning paths, thus learning by mistakes. We are interested in the route that leads to the algorithmic solution, rather than the solution itself. Regarding mistakes and intelligence, Alan Turing wrote in 1947 that "... if a machine is expected to be infallible, it cannot also be intelligent". This is a motto for this class, with a kind invitation to express your own ideas even if they could be apparently wrong. ==== Topics ==== Please note that several topics are the outcome of recent advancement in algorithms and data structures, and thus most of the course material consists in research papers or book chapters. ^ Date ^ Topics ^ References and notes ^ | Sept. 22| Introduction to the course. Entrance exam. | - | | Sept. 24| Discussion of the solutions for the exercises of the entrance exam. | see snapshots | | Sept. 28| Problem solving. Statistics on arrays (top-k elements). | see snapshots | | Sept. 29| Linear-time worst-case selection. | {{:magistraleinformatica:alg2:algo2_15:linsel.pdf|par.9.3}} | === Randomization, hashing and data streaming === Randomization is now a common powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we apply them to solve data streaming problems, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications. [Note: to refresh the basic notions on counting and probability, please refer to Appendix C in Cormen-Leiserson-Rivest-Stein's book "Introduction to Algorithms", 3rd ed., MIT Press.] ^ Date ^ Topics ^ References and notes ^ | Oct. 1| Las Vegas and Montecarlo algorithms. Indicator variables and randomised quicksort (Las Vegas). | {{:magistraleinformatica:alg2:algo2_13:randqs.pdf|par. 7.3}} | | Oct. 5| Problem solving. Randomized selection (Las Vegas). | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Oct. 6| Random indicator variables: secretary problem (and random permuting), birthday paradox. | {{:magistraleinformatica:alg2:algo2_15:prob.pdf|par. 5.1-5.3, 5.4.1}} | | Oct. 8| Karp-Rabin fingerprints: randomized checking and pattern matching (Montecarlo and Las Vegas) | {{:magistraleinformatica:alg2:algo2_15:karp-rabin-1.pdf| par.7.4-7.6}} | | Oct. 12| Randomized min-cut in unweighted graphs (Montecarlo) | {{:magistraleinformatica:alg2:algo2_15:mincut1.pdf| par.1.1}} | | Oct. 13| Randomness and Kolmogorov complexity | [[http://www.eecs.berkeley.edu/~luca/cs172/notek.pdf|Notes]] | | Oct. 15| Lecture by prof. Ferragina: issue of hierarchical memories: I/O-model. Index construction: multi-way mergesort, BSBI and SPIMI. Sketch on MapReduce. Distributed indexing: Term-based vs Doc-based partitioning. | {{:magistraleinformatica:ir:ir15:lect_06a.ppt|Slides}} [[http://nlp.stanford.edu/IR-book/pdf/04const.pdf|Chapt.4]] | | Oct. 19| Problem solving. Randomized min-cut in graphs. | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Oct. 20 | General assembly | | | Oct. 22| Data Streaming Model. Motivations and examples. Count-Min Sketches | {{http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf| sects.1-3}} [[https://sites.google.com/site/countminsketch/|Site]] {{:magistraleinformatica:alg2:algo2_12:count-min-sketch.pdf|Notes}}| | Oct. 26| Problem solving. Data streaming. | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Oct. 27 | Chernoff bounds, and applications of count-min sketches.| {{:magistraleinformatica:alg2:algo2_12:chernoff-bounds.pdf|par.4.1,Th.4.1 (proof is optional)}} {{http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf| sect.4}} {{:magistraleinformatica:alg2:algo2_12:count-min-sketch-median.pdf|Notes}}| | Oct. 29| Using a family of random hash functions: Cuckoo hashing.| {{:magistraleinformatica:alg2:algo2_10:cuckoo-undergrad.pdf|Notes}}| | Nov. 9| Problem solving. Hash functions and data streaming. | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | === Algorithms for massive data sets and hierarchical memories === Massive data sets are one of the key elements in our so-called big data society. In this scenario the design and analysis of algorithms and data structures adopt the external and cache-oblivious models, which take into account the memory hierarchy that provides performance to modern computers. We study basic problems on sorting and searching in these models, comparing the results with the classical RAM model of computation. ^ Date ^ Topics ^ References and notes ^ | Nov. 10| Review of the external memory model (EMM): sequential access versus random access, I/O complexity. Static and dynamic searching in external memory. (a,b)-trees and B-trees: definition, search and insert operations. | [[http://www.ittc.ku.edu/~jsv/Papers/Vit.IO_book.pdf|Introduction, chapt.2 and 3 (set D=1 disks)]]{{:magistraleinformatica:alg2:algo2_14:ionotes.pdf|sect.1 and 2}}| | Nov. 12| Dynamic searching in external memory: (a,b)-trees and B-trees. Deletion, construction, and I/O analysis. Lower bound for the I/O complexity of sorting. | {{:magistraleinformatica:alg2:algo2_14:ionotes.pdf|sect.2}} [[http://www.daimi.au.dk/%7Elarge/ioS06/Alower.pdf|Notes]] | | Nov. 16| Problem solving. Hash functions and data streaming. | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Nov. 17| Cache-oblivious (CO) model. Difference from the EM model and examples. | [[http://erikdemaine.org/papers/BRICS2002/paper.pdf|Sect.1-3]] | | Nov. 17| Problem solving. Algorithms for external memory (continued). | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Nov. 19| Cache-oblivious searching: van Emde Boas layout and CO B-trees. | [[http://erikdemaine.org/papers/BRICS2002/paper.pdf|[Sect.4.1]]] | | Nov. 23| Problem solving. CO algorithms. | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Nov. 24| Cache-oblivious layout of binary trees. | [[http://didawiki.di.unipi.it/lib/exe/fetch.php/magistraleinformatica/alg2/algo2_12/tree_layout_4.pdf|Notes]] [[http://www3.cs.stonybrook.edu/~bender/pub/treelayout-full.ps|Sect.5 (see 5.4)]] | | Nov. 26| Problem solving. CO algorithms (continued). | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Nov. 30| Map-Reduce paradigm. Suffix array construction (DC3). | [[http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf|paper]] [[http://wiki.apache.org/hadoop/WordCount|example]] [[http://www.cs.helsinki.fi/u/tpkarkka/publications/jacm05-revised.pdf|DC3 algorithm (up to Section3)]] | === Enumeration, hardness and approximation of some combinatorial problems === NP-hard problems are important but difficult to solve and no deterministic polynomial algorithms are currently known for them. Even basic problems, such as finding a simple path of vertices in a graph, are NP-hard. We discuss how to attack these problems (a) by counting and listing their solutions where the cost is proportional to the output (it can be exponential in the worst case but we only pay for what we get) and (b) by approximating their solutions with polynomial-time algorithms when the problem requires to minimize a cost or maximize a benefit. Most of the addressed problems are on graphs, which are popular representations for modern networked data. ^ Date ^ Topics ^ References and notes ^ | Dec. 1| Quick review of NP-completeness and NP-hardness. Euler tour vs Hamiltonian tour vs TravelSalesman Problem (TSP). | [CLRS, chap.34 (up to 34.4 plus 34.5.4, plus 35.2.2] | | Dec. 1| Problem solving. MapReduce. Euler tour. | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Dec. 3| 2-approximation for metric TSP and for Min Vertex Cover. | [CLRS, 35.1, 35.2] | | Dec. 7| Approximations for the bin packing problem. | [[http://www.dis.uniroma1.it/~ausiello/InfoTeoIIRM/book/chapter02.pdf| chapt.2: par. 2.2.2]] | | Dec. 10| Problem solving. Hard problems. | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | | Dec. 14| Approximation for the MAX-CUT and knapsack problems. | {{:magistraleinformatica:alg2:algo2_14:lec02.pdf|Notes}} [[http://www.dis.uniroma1.it/~ausiello/InfoTeoIIRM/book/chapter02.pdf| chapt.2: par. 2.1.1]] | | Dec. 15| Problem solving. Approximation algorithms. | {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|list of problems}} | == Activity in class == * This is the {{:magistraleinformatica:alg2:algo2_15:esercitazioni2015.pdf|final list}} of problems. * Some of the [[https://www.dropbox.com/sh/q3i6ko3vmg307u7/AAAKadLSHnPq_SlFrp4IVXXia?dl=0|screen snapshots]] shown during the classes. == Official documents for the course == * Access to [[http://unimap.unipi.it/registri/printregistriNEW.php?re= 169265::::&ri=9172|unimap log (registro delle lezioni)]]. * Access to the [[https://esami.unipi.it/esami/|course evaluation form (questionario studenti)]]. == Spot yourself in the classroom == {{:magistraleinformatica:alg2:algo2_15:2015-09-28_16.28.08.jpg?300|}} == Examination outcomes (in Italian) == Examination of Dec. 16, 2015 ^matricola^score^e1^e2^e3^notes^ |300759|-|-|-|-|esercitazione| |441517|29|10|9|10|e2: non è molto chiara la costruzione.| |452058|20|4|8|8|e1: only the first point done; e2: search and analysis missing; e3: missing generalization | |465982|21|7|7|7|e1:non chiarisce le strutture e i dati da utilizzare;imprecisione nel secondo punto; e2:la struttura non è implicita e mancano dettagli; e3: mancano i dettagli| |468827|18|9|0|9|e1:discute solo il caso di pesi interi; 2: non svolto; e3: manca la descrizione di come si costruisce il grafo| |479526|22|4|9|9|e1: description and analysis unclear; e2: the tree is not implicit; e3: missing how edges are set up| |483633|29|9|10|10|e1: poco chiaro come pesca gli archi e la gestione di p[]; e2: buona l'idea ma l'analisi non è descritta bene| |484837|29|10|9|10|e1: non chiaro cosa succede nelle liste di adiacenza quando due nodi sono uniti; e2: un po' tirato via.| |490068|29|9|10|10|e1:scelta dell'arco non uniforme (il nodo va sceltoin base al suo peso)| |494087|25|10|6|9|e1:piccola svista;e3:le condizioni non caratterizzano completamente il grafo| |494577|25|8|9|8|e1:la scelta dell'arco non è uniforme e l'analisi ha un passaggio poco chiaro; e2:manca la regola come scendere da padre in figlio e piccola svista nell'analisi; e3: descrizione poco chiara| |498122|27|9|8|10|e1: non dice come aggiorna le altre liste; e2: l'albero ottenuto non è implicito;| |527349|24|6|9|9|e1:manca il primo punto; e3: descrizione incomlpeta del grafo; e2:manca la regola come scendere da padre in figlio| |528025|22|4|9|9|e1: non sono chiare tutte le regole utilizzate; mancano gli altri due punti| |533408|29|10|9|10|e2:non specifica la regola per mavigare;| |533772|25|9|6|9|e1:edges are not chosen uniformly at random; e2:missing the rules to navigate; | |534789|19|9|0|10|e1: how the other lists are updated? Missing probabillity of success for wieghed graphs; e2: not done; | |537580|12|2|0|10|e1: only one point done; e2: not done;| |538315|28|10|8|10|e2: manca regola per navigare e alcuni dettagli/analisi costruzione| |539276|25|9|6|9|e1: analysis is not clear in the seocnd point; e2: missing details on the costruction and rules to navigate; | Examination of Jan. 20, 2016 * 463883 27 * 468827 26 * 541769 18 Examination of Feb. 10, 2016 * 300759 21 * 459410 30 * 489617 29 * 493413 29 * 540574 11 * 541451 12 * 541769 27 * 541784 24