Strumenti Utente

Strumenti Sito


magistraleinformatica:lad:lad14:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
magistraleinformatica:lad:lad14:start [11/11/2014 alle 19:14 (10 anni fa)] – [Lectures] Paolo Ferraginamagistraleinformatica:lad:lad14:start [19/06/2015 alle 10:39 (9 anni fa)] (versione attuale) – [Exam] Paolo Ferragina
Linea 17: Linea 17:
 ====== Goals and opportunities for students ======  ====== Goals and opportunities for students ====== 
  
-The course consists of a first part of lectures describing advanced algorithms and data structures (3 CFU), and a laboratory in the second part (3 CFU) in which the students will deploy these techniques to develop a software project. The students will select their projects among a set of proposals by **major IT companies** which are challenging from an algorithmic perspective. These companies will also contribute to identify/construct significant **datasets** that will help in testing the proposed algorithmic solutions. +The course consists of a first part of lectures describing advanced algorithms and data structures (3 CFU), and a laboratory in the second part (3 CFU) in which the students will deploy these techniques to develop a software project. The students will select their projects among a set of proposals by some **IT companies** which are challenging from an algorithmic perspective. These companies will also contribute to identify/construct significant **datasets** that will help in testing the proposed algorithmic solutions. 
  
 The course will provide the opportunity of The course will provide the opportunity of
Linea 30: Linea 30:
  
 ^ Date         ^ Room ^ Text ^ ^ Date         ^ Room ^ Text ^
-| 22/01/2015, 09:00 |  L1  | text | +| 22/01/2015, 09:00 |  L1  | {{:magistraleinformatica:lad:lad14:lab150122.doc|text}} 
-| 13/02/2015, 09:00 |  L1  | text |+| 13/02/2015, 09:00 |  L1  | {{:magistraleinformatica:lad:lad14:lab150213.doc|text}} | 
 +| 05/06/2015, 09:00 |  L1  | {{:magistraleinformatica:lad:lad14:lab150605.doc|text}} | 
 +| 29-06-2015, 09:00 |  L1  | text | 
 +| 20-07-2015, 09:00 |  L1  | text | 
 +| 10-09-2015, 09:00 |  L1  | text |
  
  
-The exam consists of three parts: Project 70%, Written/oral test 20%, Project presentation 10%.+The exam consists of three parts: Project 70%, Written/oral test 20%, Project presentation 10%. Students can attend the written/oral test before the presentation/development of the project
  
 As far as the two projects are concerned, we list below the datasets for each of them. These are //warm-up datasets// of moderate size, yet sufficient to start designing your solutions and be concerned with time/space efficiency issues. We will provide you larger datasets to test the final codes you'll produce. As far as the two projects are concerned, we list below the datasets for each of them. These are //warm-up datasets// of moderate size, yet sufficient to start designing your solutions and be concerned with time/space efficiency issues. We will provide you larger datasets to test the final codes you'll produce.
Linea 46: Linea 50:
 Few related [[https://www.dropbox.com/s/ci49poek5armjf3/papersP1.zip?dl=0|papers]]. Few related [[https://www.dropbox.com/s/ci49poek5armjf3/papersP1.zip?dl=0|papers]].
  
-**Project 2.** The [[https://mega.co.nz/#!XFslwRjA!G_JptVb3EeFVAKvPUTxjZ9ISMKoP7IuFhr6hdt5iyYg|dataset]] consists of 2.8Gb of 7zipped web pages drawn from the UK-GOV2 collection, and in [[http://commoncrawl.org/navigating-the-warc-file-format/|WARC format]]. Some techniques that you can use for your project are described in these  [[https://dl.dropboxusercontent.com/u/7999075/progettiLab2014.zip|papers]]. Software libraries are indicated [[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformatica/lad/lad14/algorithmengineeringlab_-_projects.docx|here]], plus few additional softwares from students listed below:+**Project 2.** The [[https://mega.co.nz/#!XFslwRjA!G_JptVb3EeFVAKvPUTxjZ9ISMKoP7IuFhr6hdt5iyYg|dataset]] consists of 2.8Gb of 7zipped web pages drawn from the UK-GOV2 collection, and in [[http://commoncrawl.org/navigating-the-warc-file-format/|WARC format]]. Some techniques that you can use for your project are described in these  [[https://dl.dropboxusercontent.com/u/7999075/Progetto%202%20-%20papers.zip|papers]]. Software libraries are indicated [[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformatica/lad/lad14/algorithmengineeringlab_-_projects.docx|here]], plus few additional softwares from students listed below:
   * {{:magistraleinformatica:lad:lad14:libwarc-75524e4.tgz|WARC parser}} (by Andrea Cardaci).   * {{:magistraleinformatica:lad:lad14:libwarc-75524e4.tgz|WARC parser}} (by Andrea Cardaci).
 +
 +
 ====== Background======  ====== Background====== 
  
Linea 70: Linea 76:
 | 16/10/2014 | Near-duplicate document detection: problem definition and comments, Karp-Rabin-fingerprint, Shingling, Jaccard similarity of sets, document sketches, locality sensitive hashing, the detection process. | {{:magistraleinformatica:lad:lad14:07_a._minhash-shingle.ppt|Slides}} e {{:magistraleinformatica:lad:lad14:lsh.pdf|parte di capitolo}}. |  | 16/10/2014 | Near-duplicate document detection: problem definition and comments, Karp-Rabin-fingerprint, Shingling, Jaccard similarity of sets, document sketches, locality sensitive hashing, the detection process. | {{:magistraleinformatica:lad:lad14:07_a._minhash-shingle.ppt|Slides}} e {{:magistraleinformatica:lad:lad14:lsh.pdf|parte di capitolo}}. | 
 | 21/10/2014 | An introduction to data compression: Gamma/Delta codes, Huffman code, Lempel-Ziv 1977, and Burrows-Wheeler transform. | {{https://www.dropbox.com/s/q5i70eybz0217ii/datacompression.pptx?dl=0|Slides}}. Sect. 9.1 of these {{http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2013/chap_09.pdf|notes}}, Sect. 10.1 of these {{http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2013/chap_10.pdf|notes}}, and Sect. 11.1 of these {{http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2013/chap_11.pdf|notes}}. | | 21/10/2014 | An introduction to data compression: Gamma/Delta codes, Huffman code, Lempel-Ziv 1977, and Burrows-Wheeler transform. | {{https://www.dropbox.com/s/q5i70eybz0217ii/datacompression.pptx?dl=0|Slides}}. Sect. 9.1 of these {{http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2013/chap_09.pdf|notes}}, Sect. 10.1 of these {{http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2013/chap_10.pdf|notes}}, and Sect. 11.1 of these {{http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2013/chap_11.pdf|notes}}. |
-| 23/10/2014| Discussion on the projects. | | +| 23/10/2014 | Discussion on the projects. | | 
-| 28/10/2014| Clustering: soft/hard clustering, bottom-up and top-down approaches, various metrics, K-means. Discussion on Project 2| Slides on {{:magistraleinformatica:lad:lad14:07_b.clustering.ppt|clustering}}. Slides on the {{:magistraleinformatica:lad:lad14:il_progetto_n._2.pptx|second project}}, and {{:magistraleinformatica:lad:lad14:progetto_2_-_papers.zip|papers to read}}.| +| 28/10/2014 | Clustering: soft/hard clustering, bottom-up and top-down approaches, various metrics, K-means. Discussion on Project 2| Slides on {{:magistraleinformatica:lad:lad14:07_b.clustering.ppt|clustering}}. Slides on the {{:magistraleinformatica:lad:lad14:il_progetto_n._2.pptx|second project}}, and {{:magistraleinformatica:lad:lad14:progetto_2_-_papers.zip|papers to read}}.| 
-| 30/10/2014| Discussion on the projects. | | +| 30/10/2014 | Discussion on the projects. | | 
-| 11/11/2014| Discussion on the projects: To generate k random positions, p(i), one could use the computation p(i) = h1(x) + i h2(x), where x is a random number and h1/h2 are [[https://sites.google.com/site/murmurhash/|MurmurHash]] functions. | |+| 11/11/2014 | Discussion on the projects: To generate k random positions, p(i), one could use the computation p(i) = h1(x) + i h2(x), where x is a random number and h1/h2 are [[https://sites.google.com/site/murmurhash/|MurmurHash]] functions. | | 
 +| 13/11/2014 | Discussion on the projects. | | 
 +| 18/11/2014| Discussion on the projects: Venturini's office. |  | 
 +| 20/11/2014| Discussion on the projects: Venturini's office. |  | 
 +| 25/11/2014| Discussion on the projects: Venturini's office. |  | 
 +| 27/11/2014| Discussion on the projects: Ferragina's office. |  | 
 +| 02/12/2014| Discussion on the projects: Ferragina's office. |  | 
 +| 04/12/2014| Discussion on the projects: Ferragina's office. |  | 
 +| 09/12/2014| Discussion on the projects: Ferragina's office. |  | 
 +| 11/12/2014| Discussion on the projects: Venturini's office. |  | 
 +| 16/12/2014| Discussion on the projects: Ferragina's office. |  | 
magistraleinformatica/lad/lad14/start.1415733248.txt.gz · Ultima modifica: 11/11/2014 alle 19:14 (10 anni fa) da Paolo Ferragina

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki