Strumenti Utente

Strumenti Sito


magistraleinformaticanetworking:ae:ae2021:start

Algorithm Engineering A.A. 2021-2022

Thanks to P. Sanders Teacher: Paolo Ferragina

CFU: 9 (first semester).

Course ID: 531AA.

Language: English.

Question time: Monday: 15-17, or by appointment (given COVID-19 situation, it will occur in video-conferencing within the virtual room of the course)

News: about this course will be distributed via a Telegram channel.

Official lectures schedule: The schedule and content of the lectures is available below and with the official registro.

Goals

In this course we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. The design and analysis will involve several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs and the availability of Big Data upon which those algorithms could work on. We will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

Week Schedule of the Lectures

Week Schedule
Day Time Slot Room
Monday 9:00 - 11:00 Room Fib A1, and Virtual Room of the course
Tuesday 11:00 - 13:00 Room Fib L1, and Virtual Room of the course
Wednesday 9:00 - 11:00 Room Fib A1, and Virtual Room of the course

Exam

The exam will consist of a written test plus an oral discussion.

As last year, I'm planning to have one midterm exam in November and one in December. The two midterms (if >= 18 as vote) will be combined with an oral exam to be given in January, which can increase or decrease the final vote (by few points, given that they will somewhat “average” with the vote reported in the written exam).

Dates Room Text Notes
08/11/2021, start at 09:00 room A1 and virtual text, results, solution The midterm exam will consist of a set of exercises, and will last for 45mins. The part of the program for the exercises will be detailed in the list of lectures below.
15/12/2021, start at 09:00 room C and virtual text, results, solution The FinalTerm exam will have the same structure as the other one, but can participate only the students who got >=18 rank in the first MidTerm. Students have to register at the following link by December 7th, 2021.
14/01/2022, start at 14:00 room C1 text, results, solution
02/02/2022, start at 09:00 room C1 text, results, solution

Background and Notes of the Course

I strongly suggest refreshing your knowledge about basic Algorithms and Data Structures by looking at the well-known book Introduction to Algorithms, Cormen-Leiserson-Rivest-Stein (third edition). Specifically, I suggest you look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10, and 15-17.

Most of the content of the course will be covered by some notes I wrote in these years; for some topics, parts of papers/books will be used. You can download the latest version of these notes from this link. I state that this material will be published by Cambridge University Press as Pearls of Algorithm Engineering by me. This prepublication version is free to view and download for personal use only. Not for redistribution, resale or use in derivative works. © Paolo Ferragina 2020.

Lectures

Students that are not able to attend the lectures, can refer to the video-lectures of the last academic year (click on “Videos”, and then sort them by name). The video-lectures of this year are available in the agenda below.

The lectures below include also some EXTRA material, which is suggested to be read for students aiming to high rankings.

Date Lecture Biblio
13/09/2021 Introduction to the course. Models of computation: RAM, 2-level memory. An example of algorithm analysis: the sum of n numbers, and binary search. The B-tree (or B+-tree) data structure: searching and updating a big-set of sorted keys. Exercise: evaluating the I/O-cost of binary search. Chap. 1 of the notes.
Read note 1 and note 2 on B-trees.
video part 1 and part 2
14/09/2021 The role of the Virtual Memory system. Algorithm for Permuting. Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds. Binary merge-sort, and its I/O-bounds. Chapter 2 (no sect 2.5-)
video part 1 and part 2
15/09/2021 Snow Plow, with complexity proof and an example. Multi-way mergesort. The case of D>1 disks: non-optimality of multi-way MergeSort, the disk-striping technique.
Finding the maximum-sum subsequence (study from my notes!).
Chap. 5 of the notes.
video part 1 and part 2
Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT
20/09/2021 Lower bound for sorting: comparisons and I/Os. Quicksort: recap on best-case, worst-case. Quicksort: Average-case with analysis. Selection of kth ranked item in linear average time (with proof). 3-way partition for better in-memory quicksort. RandSelect.
EXTRA: Lower bound for permuting.
Chap. 5 of the notes
video part 1 and part 2
21/09/2021 The dual pivot. Bounded Quicksort. Multi-way Quicksort: definition, design and I/O-complexity. Selection of k-1 “good pivot” via Oversampling. Proof of the average time complexity of Multi-way Quicksort. Chap. 5 of the notes
video part 1 and part 2
22/09/2021 Random sampling: disk model, known length (algorithms and proofs). Random sampling on the streaming model, known and unknown length. Reservoir sampling. Algorithm and proofs. Chap. 3 of the notes.
27/09/2021 Exercises on Random Sampling and sorting.
Randomized data structures: Treaps (with time complexity and proofs).
Notes by others. Study also Theorems and Lemmas.
video part 1 and part 2
28/09/2021 Randomized data structures: Skip lists (with proofs and comments on time-space complexity and I/Os). See Demaine's lecture num. 12 on skip lists.
video part 1 and part 2
29/09/2021 Fast set intersection, various solutions: scan, sorted merge, binary search, mutual partition, binary search with exponential jumps. Fast set intersection: two-level scan. Interpolation Search.
EXTRA: random shuffling.
Chap. 6 of the notes, and Sect 9.2.
video part 1 and part 2
06/10/2021 Prefix-free codes, the notion of entropy. Integer coding: the problem and some considerations. Chap. 11 of the notes.
From last year: Video part 1part 2
11/10/2021 Shannon Theorem and optimal codes. The codes Gamma and Delta, space/time performance, and consideration on optimal distributions. The codes Rice, PForDelta, variable-byte. Video part 1 and part 2
12/10/2021 (s,c)-codes, with examples. Video part 1 and part 2
13/10/2021 Interpolative code. Elias-Fano code. Video part 1 and part 2
18/10/2021 More on Elias-Fano code. Exercises Video part 1 and part 2
19/10/2021 Exercises on integer coders. Python code for Interpolative coding. Video and notes of today.
20/10/2021 Exercises on integer encoders.
Lab activity (optional): A gentle introduction to the Succinct Data Structure Library: integer sequences. (Please, prepare in advance the environment for experimenting with the SDSL library, as indicated at the repo.)
video and slides SDSL
25/10/2021 String sorting: comments on the difficulty of the problem on disk, lower bound. LSD-radix sort with proof of time complexity and correctness. MSD-radix sort and the trie data structure. Chap. 7 of the notes.
Video
26/10/2021 Multi-key Quicksort. Ternary search tree. Exercises. Video
27/10/2021 Exercises. Video
02/11/2021 Exercises. video
03/11/2021 Exercises.
08/11/2021 MidTerm Exam (see above for details, and topics are the ones up to this point in the Agenda).
09/11/2021 Rank and Select: definition and succinct solution for Rank. Compressed solution based on Elias-Fano coding.
EXTRA: Solution for Select.
Chapter 15 of the notes.
Video
10/11/2021 Succinctly encoding binary trees, with examples. Sect. 9.2.
Video
15/11/2021 Data Compression. Static, semistatic and dynamic models for frequency estimation. Huffman, with optimality (proof) and relation to Entropy Chap. 12 of the notes. No PPM.
video
16/11/2021 Canonical Huffman: construction, properties. Huffman decompression. Arithmetic coding: properties, algorithm and proofs. Video
17/11/2021 Exercises on Statistical coders notes, Video
22/11/2021 More on Arithmetic coding: decoding, encoding the output, and entropy bounds. Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS, LZ78. Chap 13.1 and 13.2. No LZW. Slides and video.
23/11/2021 Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. Suffix Array construction via qsort and its asymptotic analysis. Text mining use of suffix arrays. Compressor bzip: MTF, RLE, Burrows-Wheeler Transform. Chap. 10 of the notes: 10.1, 10.2.1 (but no page 10-4 and 10-5 and thus no Lemma 10.2), 10.2.3 (but no “The skew algorithm”, no “The Scan-based algorithm”), 10.4.3 and 14.1-14.3.
Video and slides
23/11/2021 Exercises Video and notes
24/11/2021 How to construct the BWT via qsort. How to invert the BWT. Exercises. Video part 1 and part 2. Notes on exercises.
00/00/2021 Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing based on compacted tries or Patricia Trees. Analysis of space, I/Os and time of all described solutions. Chap. 9 of the notes: 9.1, 9.4, 9.5. No Locality Preserving front coding (9.3).
15/12/2021 FinalTerm exam. Topics will be the ones that we have dealt with after the MidTerm exam. Students have to register at the following link by December 7th, 2021.
Last year program, just for reference
Date Lecture Biblio
Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining. Lectures 7 of Demaine-Leiserson's course at MIT
00/00/2021 – 05.10.2020 Hashing and dictionary problem: direct addressing, simple hash functions, hashing with chaining. Uniform hashing and its computing/storage cost, universal hashing (definition and properties). Chap. 8 of the notes. All theorems with proof, except Theo 8.5 without a proof (only the statement).
00/00/2021 – 06.10.2020 Two examples of Universal Hash functions: one with correctness proof, the other without. Perfect hash table (with proof).
00/00/2021 – 07.10.2020 Minimal ordered perfect hashing: definition, properties, construction, space and time complexity. Exercise.
00/00/2021 – 12.10.2020 Cuckoo hashing (with all proofs).
00/00/2021 Bloom Filter: properties, construction, query and insertion operations, error estimation (with proofs). Spectral Bloom Filter.
EXTRA: Lower bound on BF space
No 8.7.2 (compressed BF)
00/00/2021 Lecture (extra) on Genome compression.
magistraleinformaticanetworking/ae/ae2021/start.txt · Ultima modifica: 24/11/2021 alle 10:38 (4 giorni fa) da Paolo Ferragina