### Indice

# Machine Learning: Neural Networks and Advanced Models (AA2)

**Apprendimento Automatico: Reti Neurali e Modelli Avanzati**

Instructor: **Davide Bacciu**

Contact: email - phone 050 2212749

Office: Room 367, Dipartimento di Informatica, Largo B. Pontecorvo 3, Pisa

Office Hours: Tuesday, 17-19

## News

**(04/04/2016) Note for Students of Academic Year 2015/2016** The AA2 course is **inactive** during year 2015/2016. Students interested in the course can take the replacement course "Computational Neuroscience" from the M.Sc. in Bionics Engineering.

(02/04/2015) List of midterm assignments to students is now out

(13/03/2015) Midterm reading list and dates now out

(25/02/2015) Updated course schedule - A new course schedule is now in place. Beware that the lecture of Monday 02/03/2015 will exceptionally be held in room C1 at 16-18

(29/01/2015) Course information - Added course description, topics and reference materials.

(20/01/2015) Course Didawiki first online - Preliminary information on course schedule. More information to come by early February.

## Course Information

**Weekly Schedule**

The course is held on the second term.
The preliminary schedule for **A.A. 2014/15** is provided in table below.

Note that the lecture of **Monday 02/03/2015** will exceptionally be held in room C1 **at 16-18**.

Day | Time | Room |
---|---|---|

Monday | 11-13 | C1 |

Thursday | 14-16 | C1 |

**Objectives**

Machine learning has recently become a central area of computer science, playing a major role in the development of a wide range of advanced applications. Machine learning solutions are used to address a variety of problems in computers science (e.g. search engines, machine vision), engineering (e.g. robotics, signal processing) as well as in other research and application areas (biology, chemistry, medicine), leading to novel multidisciplinary areas, such as BioInformatics, ChemInformatics, biomedical signal processing, etc. Providing solutions to challenging applications requires the ability to design machine learning models capable of dealing with complex domains, that include noisy, hard-to-interpret, semantically rich information, such as natural language documents, images and videos, as well as non-vectorial relational information, such as sequences, trees and graphs in general.

The goal of this course is to provide knowledge to become a specialist in the field of design of novel machine learning models for such advanced applications and complex data domains. The students are expected to gain knowledge of state-of-the-art machine learning models such as recurrent neural networks, reservoir computing, deep learning, kernel methods and probabilistic generative models. The course focuses on the treatment of complex application domains (images, biomedical data, etc) and non-vectorial information, throughout the introduction of adaptive methods for the processing of sequences and structures of variable dimension. Much emphasis is given to the synergy between the development of advanced learning methodologies and the modelling of innovative interdisciplinary applications for complex domains of the Natural Sciences, as well as to the introduction of the students to innovative research themes.

Students completing the course are expected to gain in-depth knowledge of the selected research topics, understand their theory and applications, to be able to individually read, understand and discuss research works in the field. The course is targeted at students who are pursuing specializations in machine learning and computational intelligence, but it is of interest for data mining and information retrieval specialists, roboticists and those with a bioinformatics curricula.

Please feel free to contact the Instructor for advices on the machine learning curricula or on the availability of final projects.

**Course Prerequisites**

Course prerequisites include knowledge of machine learning fundamentals (e.g. covered through course AA1). Knowledge of elements of probability and statistics, calculus and optimization algorithms are a plus.

**Course Overview**

The course introduces advanced machine learning models and interdisciplinary applications, with a focus on the adaptive processing of complex data and structured information.

The course is articulated in four parts. The first three parts introduce advanced models associated with three major machine learning paradigms, that are neural networks, probabilistic and Bayesian learning and kernel methods. We will follow an incremental approach starting from the introduction of learning models for sequential data processing and showing how these can be extended to deal with more complex structured domains. The fourth part is devoted to discussing advanced applications, with particular emphasis on multidisciplinary applications. These case studies will show how innovative learning models are introduced from the need to provide solutions to novel applications.

The course hosts guest seminars by national and international researchers working on the field as well as by companies that are engaged in the development of advanced applications using machine learning models.

*Topics covered* - dynamical recurrent neural networks; reservoir computing; graphical models and Bayesian learning; hidden Markov models; Markov random fields; latent variable models; non-parametric and kernel-based methods; learning in structured domains (sequences, trees and graphs); unsupervised learning for complex data; deep learning; emerging topics and applications in machine learning.

**Textbook and Teaching Materials**

The course does not have an official textbook covering all its contents. However, good reference books covering parts of the course are listed at the bottom of this section (note that some of them have an electronic version freely available for download).

Lecture slides will be made available on this page by the end of the lessons and they should be sufficient (together with course attendance) to prepare the final exam. Suggested readings are also proposed in the detailed lecture schedule.

The official language of the course is English: all materials, references and books are in English. Classes will be held in English if international students are attending.

*Neural Networks:*

[NN] Simon O. Haykin Neural Networks and Learning Machines Pearson (2008)

*Probabilistic Models:*

[BRML] David Barber Bayesian Reasoning and Machine Learning Cambridge University Press (2012)

A PDF version of [BRML] and of the associated software are freely available

*Inference and Learning:*

[MCK] David J.C. MacKay Information Theory, Inference, and Learning Algorithms Cambridge University Press (2003)

A PDF version of [MCK] is freely available

*Kernel Methods:*

[KM] John Shawe-Taylor and Nello Cristianini Kernel Methods for Pattern Analysis Cambridge University Press (2004)

## Lectures

Date | Room | Topic | References | Additional Material | |
---|---|---|---|---|---|

1 | 23/2/15 (16-18) | C1 | Introduction to the course: motivations and aim; course housekeeping (exams, timetable, materials); introduction to structured data slides | ||

2 | 26/2/15 (14-16) | C1 | Recurrent Neural Networks: basic models (guest lecture by Alessio Micheli). Time representation: explicit/implicit; feedbacks; shift operator; simple recurrent neural networks | [NN] Sect. 15.1, 15.2 | |

3 | 02/3/15 (16-18) | C1 | Recurrent Neural Networks: basic models (guest lecture by Alessio Micheli). Properties; transductions; unfolding; RNN taxonomy | [NN] Sect. 15.2, 15.3, 15.5 | |

4 | 05/3/15 (14-16) | C1 | Recurrent Neural Networks: learning algorithms (guest lecture by Alessio Micheli). BPTT (outline); RTRL (development) | [NN] Sect 15.6, 15.7, 15.8 | |

5 | 09/3/15 (11-13) | C1 | Recurrent Neural Networks: Reservoir Computing and Echo State Networks (guest lecture by Claudio Gallicchio) slides | [1][2] Reservoir Computing and Echo State Networks | [3] Echo State Networks [4] Markovianity and Architectural Factors |

6 | 12/3/15 (14-16) | C1 | Probabilistic and Graphical Models: probability refresher; conditional independence; graphical model representation; Bayesian Networks slides | [BRML] Chapter 1 and 2 [BRML] Sect. 3.1, 3.2 and 3.3.1 | |

7 | 16/3/15 (11-13) | C1 | Directed and Undirected Graphical Models: Bayesian Networks; Markov Networks; Markov Blanket; d-separation; structure learning slides | [BRML] Sect. 3.3 (Directed Models) [BRML] Sect. 4.1, 4.2.0-4.2.2 (Undirected Models) [BRML] Sect. 4.5 (Expressiveness) | |

8 | 19/3/15 (14-16) | C1 | Inference in Graphical Models: inference on a chain; factor graphs; sum-product algorithm; elements of approximate inference slides | [BRML] Sect. 4.4 (Factor Graphs) [BRML] Sect. 5.1.1 (Variable Elimination and Inference on Chain) [BRML] Sect. 5.1.2-5.1.5 (Sum-product Algorithm) | [5] Factor graphs and the sum-product algorithm [McK] Sect. 26.1 and 26.2 (More on sum-product) [BRML] Sect. 28.3-28.5 and [McK] Chapter 33 (Variational Inference) [BRML] Sect. 27.1-27.4 and [McK] Chapter 29 (Sampling methods) |

9 | 23/3/15 (11-13) | C1 | Dynamic Bayesian Networks I: Hidden Markov Models; forward-backward algorithm; generative models for sequential data | [BRML] Sect. 23.1.0 (Markov Models) [BRML] Sect. 23.2.0-23.2.4 (HMM and forward backward) | [7] A classical tutorial introduction to HMMs |

10 | 26/3/15 (14-16) | C1 | Processing of structured domain in ML: Recursive Neural Networks for trees (guest lecture by Alessio Micheli) | [6] General framework for adatptive processing of structured data | |

11 | 30/3/15 (11-13) | C1 | Dynamic Bayesian Networks II: EM learning; applications of HMM slides | [BRML] Sect. 23.3.1 (Learning in HMM) [BRML] Sect. 23.4.2 (Input-output HMM) [BRM] Sect. 23.4.4 (Dynamic BN) | [7] A classical tutorial introduction to HMMs |

12 | 02/4/15 (14-16) | C1 | Question & Answering; Exercises on graphical models; Midterm exam arrangments | ||

MID | 14/4/15 (15-18) | C1 | Midterm Exams | ||

13 | 16/4/15 (14-16) | C1 | Generative Modeling of Tree-Structured Data slides | [7] [8] Bottom-up hidden tree Markov models [9] Top-down hidden tree Markov model | [10] Learning tree transductions [11] Tree visualization on topographic maps |

14 | 20/4/15 (11-13) | C1 | Latent Topic Models slides | [BRML] Sect. 20.4-20.6.1 | [12] LDA foundation paper [13] A gentle introduction to latent topic models |

15 | 23/4/15 (14-16) | C1 | Reservoir Computing for Trees and Graphs (guest lecture by Claudio Gallicchio) slides | [14] TreeEsn [15] GraphEsn | [16] Additional on GraphEsn [17] Constructive NN for graphs |

16 | 27/4/15 (11-13) | C1 | Deep Learning slides | [18] A classic divulgative paper from the initiator of Deep Learning [19] Recent review paper | [20] A freely available book on deep learning from Microsoft RC |

17 | 30/4/15 (14-16) | C1 | Kernel and non-parametric methods: kernel method refresher; kernels for complex data (sequences, trees and graphs); convolutional kernels; adaptive kernels slides | [KM] Chapters 2 and 9 - Kernel methods refresher and kernel construction [KM] Chapter 11 - Kernels for structured data [KM] Chapter 12 - Generative kernels | [21] Generative kernels on hidden states multisets |

18 | 04/5/15 (11-13) | C1 | Kernel and non-parametric methods: Linear and Non-Linear Dimensionality Reduction (guest lecture by Alexander Schulz) slides | [BRML] Sect. 15.1-15.2 PCA [BRML] Sect. 15.7 Kernel PCA | [22] t-SNE |

19 | 07/5/15 (14-16) | C1 | Kernel and non-parametric methods: Recent Advances in Dimensionality Reduction (guest lecture by Alexander Schulz) slides | ||

20 | 11/5/15 (11-13) | C1 | An Overview of ML research at UNIPI; final project proposals | ||

21 | 18/5/15 (11-13) | C1 | Company Talk: Henesis (Artificial Perception) | ||

22 | 21/5/15 (14-16) | C1 | Company Talk: Kode Solutions | ||

23 | 21/5/15 (16-18) | C1 | Final lecture: course wrap-up; final project assignments; exam information |

## Exams

Course examination for students attending the lectures is performed in 3 stages: a midterm assignment, a final project and an oral presentation. Passing the exam requires to successfully complete ALL the 3 stages.

### Midterm Assignment

Students will be asked to pick 1 article from the reading list and to prepare a short presentation to be given in front of the class. The presentation, in order to be successful, should (at least) answer the questions associated to the article in the reading list, which tipically include a mathematical derivation of a major theoretical result or of a learning algorithm reported in the paper. The assignment is due in the middle of the term.

**Midterm assigment for academic year 2014/15**

*Time:* Tuesday 14th April 2015, h. 15.00 - *Room:* C1

### Final project

Students can choose from a set of topics/problems suggested by the instructor or propose his/her own topic to investigate (within the scope of the course). Projects can be of the following type

*Survey*- Read at least three relevant and distinct papers on a topic and write a report. This must not be a simple summary of the papers: rather, students are expected to try to find connections between the works and to highlight interesting open problems.*Original*- Propose you own research project and develop it (with my help) as much as possible: the project must have a substantial innovative component and will be handled as a report as for the Survey project type. If you plan to propose your own project/topic you will need first to submit a short (one page) proposal for the purpose of approval and feedback.*Software*- Develop a well-written, tested and commented software (with documentation) implementing a non-trivial learning model and/or an application relevant for the course. The topic of this project should also be agreed with the instructor.

Students must select the project type and topic before the last lecture of the course. The project report/software should be handled (at least) 7 days before its oral presentation.

**NEW!!** Project reports should be formatted using the provided LaTex or MS Word templates.

### Oral Presentation

Prepare a seminar on the project to be discussed in front of the instructor and anybody interested. Students are expected to prepare slides for a 25 minutes presentation which should summarize the ideas, models and results in the report. The exposition should demonstrate a solid understanding of the main ideas in the report as well as of the key concepts of the course.

### Alternative Exam Modality

Working students and those not attending course lectures will handle a final project as above and will also be subject to an oral examination including both an oral presentation of the project as well as an examination on the course program (models, algorithms and theoretical results). Students should contact the instructor by mail to arrange project topics and examination dates.

## Further Readings

[1] M. Lukosevicius, H. Jaeger, Reservoir computing approaches to recurrent neural network training, Computer Science Review vol. 3(3), pag. 127-149, 2009

[2] H. Jaeger, H. Haas, Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication, Science, vol.304, pag. 78-80, 2004

[3] H. Jaeger, The “echo state” approach to analysing and training recurrent neural networks, GMD - German National Research Institute for Computer Science, Tech. Rep., 2001

[4] C. Gallicchio, A. Micheli, Architectural and markovian factors of echo state networks, Neural Networks, vol. 24(5), pag. 440–456, 2011

[5] Kschischang, Frank R., Brendan J. Frey, and H-A. Loeliger. “Factor graphs and the sum-product algorithm.” Information Theory, IEEE Transactions on 47.2 (2001): 498-519.

[6] P. Frasconi, M. Gori, and A. Sperduti, A General Framework for Adaptive Processing of Data Structures, IEEE Transactions on Neural Networks. Vol. 9, No. 5, pp. 768-786, 1998.

[7] Lawrence R. Rabiner:a tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, pages 257-286

[8] D. Bacciu, A. Micheli and A. Sperduti, “Compositional Generative Mapping for Tree-Structured Data - Part I: Bottom-Up Probabilistic Modeling of Trees”, IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 12, pp. 1987-2002, 2012

[9] M. Diligenti, P. Frasconi, M. Gori, “Hidden tree Markov models for document image classification”, IEEE Transactions. Pattern Analysis and Machine Intelligence, Vol. 25, pp. 519-523, 2003

[10] D. Bacciu, A. Micheli and A. Sperduti, “An Input-Output Hidden Markov Model for Tree Transductions”, Neurocomputing, Elsevier, Vol. 112, pp. 34-46, Jul, 2013

[11] D. Bacciu, A. Micheli and A. Sperduti, “Compositional Generative Mapping for Tree-Structured Data - Part II: Topographic Projection Model”, IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 2, pp. 231-247, Feb 2013

[12] D. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003

[13] D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012.

[14] C. Gallicchio, A. Micheli. Tree Echo State Networks. Neurocomput. 101, 319-337, 2013.

[15] C. Gallicchio, A. Micheli. Graph echo state networks, Neural Networks (IJCNN), The 2010 International Joint Conference on. IEEE, 2010.

[16] C. Gallicchio, A. Micheli. Supervised State Mapping of Clustered GraphESN States, In Frontiers in Artificial Intelligence and Applications, WIRN11, Vol. 234, pp. 28-35, 2011

[17] A. Micheli, Neural network for graphs: a contextual constructive approach, IEEE Transactions on Neural Networks, volume 20 (3), pag. 498-511, doi: 10.1109/TNN.2008.2010350, 2009

[18] G.E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks.Science 313.5786 (2006): 504-507.

[19] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 35(8) (2013): 1798-1828.

[20] L. Deng and D. Yu. Deep Learning Methods and Applications, 2014

[21] D. Bacciu, A. Micheli and A. Sperduti, Integrating Bi-directional Contexts in a Generative Kernel for Trees, Proceedings of the 2014 IEEE International Joint Conference on Neural Networks (IJCNN'14), pp.4145 - 4151, IEEE, 2014

[22] L. van der Maaten, G. Hinton, Visualizing Data using t-SNE, Journal of Machine Learning Research, Vol. 9, pp. 2579-2605, 2008