Some useful infos on the course:
The course consists of a first part of lectures describing advanced algorithms and data structures (3 CFU), and a laboratory in the second part (3 CFU) in which the students will deploy these techniques to develop a software project. The students will select their projects among a set of proposals by some IT companies which are challenging from an algorithmic perspective. These companies will also contribute to identify/construct significant datasets that will help in testing the proposed algorithmic solutions.
The course will provide the opportunity of
Date | Room | Text |
---|---|---|
22/01/2015, 09:00 | L1 | text |
13/02/2015, 09:00 | L1 | text |
05/06/2015, 09:00 | L1 | text |
29-06-2015, 09:00 | L1 | text |
20-07-2015, 09:00 | L1 | text |
10-09-2015, 09:00 | L1 | text |
The exam consists of three parts: Project 70%, Written/oral test 20%, Project presentation 10%. Students can attend the written/oral test before the presentation/development of the project.
As far as the two projects are concerned, we list below the datasets for each of them. These are warm-up datasets of moderate size, yet sufficient to start designing your solutions and be concerned with time/space efficiency issues. We will provide you larger datasets to test the final codes you'll produce.
Project 1. The dataset consists of the following parts:
Few related papers.
Project 2. The dataset consists of 2.8Gb of 7zipped web pages drawn from the UK-GOV2 collection, and in WARC format. Some techniques that you can use for your project are described in these papers. Software libraries are indicated here, plus few additional softwares from students listed below:
If you wish to refresh your mind on basic Algorithms and Data Structures, we suggest you to look at the well-known book Introduction to Algorithms by Cormen-Leiserson-Rivest-Stein, third edition.
Moreover, you can look at the content of the Algorithm Engineering course by Paolo Ferragina. Anyway, depending on the topics addressed in the lectures you'll be pointed out to the suitable chapters of Ferragina's notes.
We'll often do not project slides for teaching, but use mainly the old-fashioned blackboard. Most of the content of the course will be covered by notes, sometime we'll use parts of papers/books.
Date | Lecture | Biblio |
---|---|---|
24/09/2014 | Introduzione al corso: temi, progetti e svolgimento dell'esame. | progetti assegnati, slides |
25/09/2014 | (Compact) Trie, Ternary Search Trie, e Patricia Trie. | Sezioni 7.1, 7.4 e 7.5 di queste note. Questo articolo su Dr.Dobb's. |
30/09/2014 | Strutture dati succinte: rank/select, Elias-Fano, alberi (LOUDS e BP). | Lucidi sufficienti per lo studio. |
02/10/2014 | Strutture dati per l'indicizzazione compressa di testi: Suffix Array, Trasformata di Burrows-Wheeler, FM-index (count, locate, extract). | Lucidi sufficienti per lo studio. |
07/10/2014 | Compressed Permuterm Index e Dictionary search with prefix, suffix, substring, prefixSuffix match. Hashing: chaining (with time/space complexity), Universal hashing, Minimal ordered perfect hash. | Lucidi1 e Lucidi2 sufficienti per lo studio. |
09/10/2014 | Alberi succinti DFUDS. Descrizione Progetto 1 e possibili soluzioni. | Lucidi |
14/10/2014 | Permuting e Sorting: I/O-bottleneck, binary mergesort, multi-way mergesort. | Lucidi e capitolo 5 note di P. Ferragina (no 5.1.2, e no 5.2 in poi). |
16/10/2014 | Near-duplicate document detection: problem definition and comments, Karp-Rabin-fingerprint, Shingling, Jaccard similarity of sets, document sketches, locality sensitive hashing, the detection process. | Slides e parte di capitolo. |
21/10/2014 | An introduction to data compression: Gamma/Delta codes, Huffman code, Lempel-Ziv 1977, and Burrows-Wheeler transform. | Slides. Sect. 9.1 of these notes, Sect. 10.1 of these notes, and Sect. 11.1 of these notes. |
23/10/2014 | Discussion on the projects. | |
28/10/2014 | Clustering: soft/hard clustering, bottom-up and top-down approaches, various metrics, K-means. Discussion on Project 2 | Slides on clustering. Slides on the second project, and papers to read. |
30/10/2014 | Discussion on the projects. | |
11/11/2014 | Discussion on the projects: To generate k random positions, p(i), one could use the computation p(i) = h1(x) + i h2(x), where x is a random number and h1/h2 are MurmurHash functions. | |
13/11/2014 | Discussion on the projects. | |
18/11/2014 | Discussion on the projects: Venturini's office. | |
20/11/2014 | Discussion on the projects: Venturini's office. | |
25/11/2014 | Discussion on the projects: Venturini's office. | |
27/11/2014 | Discussion on the projects: Ferragina's office. | |
02/12/2014 | Discussion on the projects: Ferragina's office. | |
04/12/2014 | Discussion on the projects: Ferragina's office. | |
09/12/2014 | Discussion on the projects: Ferragina's office. | |
11/12/2014 | Discussion on the projects: Venturini's office. | |
16/12/2014 | Discussion on the projects: Ferragina's office. |