Strumenti Utente

Strumenti Sito


mds:smd:start

Statistical Methods for Data Science A.Y. 2020/21

This course is discontinued. Starting from A.Y. 2021/22, it has been replaced by a 9 ECTS version:

Instructor

Classes

Day of Week Hour Room
Tuesday 16:00 - 18:00 Teams Virtual Room
Wednesday 9:00 - 11:00 Teams Virtual Room

Pre-requisites

Students should be comfortable with most of the topics on mathematical calculus covered in:

  • [P] J. Ward, J. Abdey. Mathematics and Statistics. University of London, 2013. Chapters 1-8 of Part 1.

Extra-lessons refreshing such notions may be planned in the first part of the course.

Mandatory Teaching Material

The following are mandatory text books:

  • [T] F.M. Dekking C. Kraaikamp, H.P. Lopuha, L.E. Meester. A Modern Introduction to Probability and Statistics. Springer, 2005.
  • [R] P. Dalgaard. Introductory Statistics with R. 2nd edition, Springer, 2008.

Software

Preliminary program and calendar

Student project

  • The project can be done in groups of at most 4 students.
  • The project must be delivered (report + code) by end of July.
  • The oral discussion must be done by the September session, and it will cover both the project and all topics of the course.
  • The project replaces the written exam but students have to register for the written dates in order to fill the student's questionnaire.
  • Groups ready to discuss send the project to the teacher plus availability time slots for oral discussion.

Written exam

There are no mid-terms. The exam consists of a written part and an oral part. The written part consists of exercises on the topics of the course. Each question is assigned a grade, summing up to 30 points. Students are admitted to the oral part if they receive a grade of at least 18 points. Written exam consists of open questions and exercises. Example written texts: sample1, sample2. Oral consists of critical discussion of the written part and of open questions and problem solving on the topics of the course.
Online exams: during the COVID-19 restrictions, the written part and the oral part will be online. For the written part, students will connect to a reserved Teams virtual room and will activate both microphone and web-cam. The text will be shared in the virtual room chat. Solutions will be written on sheet of papers. Each sheet will include name, surname, student id, and it will be signed. A photo of the sheets will be delivered to ruggieri [at] di [dot] unipi [dot] it at the end of the written part.

Registration to exams is mandatory (beware of the registration deadline!): register here

Class calendar

Date Room Topic Learning material
01 16.02 16:00-18:00 Teams Introduction. Probability and independence. rec01 audio-video (.mp4) [T] Chpts. 1-3 slides01 (.pdf)
02 23.02 16:00-18:00 Teams R basics. rec02 audio-video (.mp4) [R] Chpts. 1,2.1,2.2 slides02 (.pdf), script02 (.R)
03 24.02 9:00-11:00 Teams Discrete random variables. rec03 audio-video (.mp4) [T] Chpt. 4 [R] Chpt. 3 slides03 (.pdf), script03 (.R)
04 02.03 16:00-18:00 Teams Recalls: derivatives and integrals. rec04 audio-video (.mp4) [P] Chpt. 1-8 slides04 (.pdf), script04 (.R)
05 03.03 9:00-11:00 Teams Continuous random variables. Simulation. rec05 audio-video (.mp4) [T] Chpts. 5, 6.1-6.2 [R] Chpt. 3 slides05 (.pdf), script05 (.R)
06 09.03 16:00-18:00 Teams Expectation and variance. Computations with random variables. rec06 audio-video (.mp4) [T] Chpts. 7,8 slides06 (.pdf), script06 (.R)
07 10.03 9:00-11:00 Teams R data access and programming. rec07 audio-video (.mp4) [R] Chpt. 2.3,2.4 script07 (.zip)
08 16.03 16:00-18:00 Teams Power laws and Zipf laws. rec08 audio-video (.mp4) Newman's paper Sect I, II, III(A,B,E,F) slides08 (.pdf), script08 (.zip)
09 17.03 9:00-11:00 Teams Moments, joint distributions, sum of random variables. rec09 audio-video (.mp4) [T] Chpts. 9-11 slides09 (.pdf), script09 (.zip)
10 23.03 16:00-18:00 Teams Law of large numbers. The central limit theorem. rec10 audio-video (.mp4) [T] Chpts. 13-14 slides10 (.pdf), script10 (.R)
11 24.03 9:00-11:00 Teams Project presentation. Graphical summaries. rec11 audio-video (.mp4) [T] Chpt. 15 slides11 (.pdf), script11 (.R)
12 30.03 16:00-18:00 Teams Numerical summaries. Data preprocessing in R. rec12 audio-video (.mp4) [T] Chpt. 16, [R] Chpts. 4,10 slides12 (.pdf), script12 (.R), dataprep.R
13 7.04 9:00-11:00 Teams Unbiased estimators. Efficiency and MSE. rec13 audio-video (.mp4) [T] Chpts. 17.1-17.3, 19, 20 slides13 (.pdf), script13 (.R)
14 13.04 16:00-18:00 Teams Maximum likelihood estimation. rec14 audio-video (.mp4) [T] Chpt. 21 notes1.pdf slides14 (.pdf), script14 (.R)
15 14.04 9:00-11:00 Teams Linear regression. Least squares estimation. rec15 audio-video (.mp4) [T] Chpts. 17.4,22 [R] Chpts. 6 notes2.pdf slides15 (.pdf), script15 (.R)
16 20.04 16:00-18:00 Teams Multiple, non-linear, and logistic regression. rec16 audio-video (.mp4) [R] Chpt. 12.1,13,16.1-16.2 notes2.pdf slides16 (.pdf), script16 (.zip)
17 21.04 9:00-11:00 Teams Logistic regression (ctd). Introduction to confidence intervals. rec17 audio-video (.mp4) [T] Chpts. 23.1 slides17 (.pdf), script17 (.R)
18 27.04 16:00-18:00 Teams Confidence intervals: Gaussian, T-student, large sample method. Confidence intervals in linear regression. rec18 audio-video (.mp4) [T] Chpts. 23.2,23.4, 4.3,24.4 notes2.pdf
19 28.04 9:00-11:00 Teams Empirical bootstrap. Application to confidence intervals. rec19 audio-video (.mp4) [T] Chpts. 18.1,18.2,23.3 slides19 (.pdf), script19 (.R)
20 04.05 16:00-18:00 Teams Parametric bootstrap. Hypotheses testing. rec20 audio-video (.mp4) [T] Chpts. 18.3,25 slides20 (.pdf), script20 (.R)
21 05.05 9:00-11:00 Teams One-sample tests of the mean and application to linear regression.rec21 audio-video (.mp4) [T] Chpts. 26-27, [R] Chpts. 5.1,5.2 slides21 (.pdf), notes2.pdf, script21 (.R)
22 11.05 16:00-18:00 Teams Multiple comparisons. Fitting distributions.rec22 audio-video (.mp4) K-S, slides22 (.pdf), script22 (.R)
23 12.05 9:00-11:00 Teams Two-sample tests of the mean, and F-test.rec23 audio-video (.mp4) [T] Chpts. 28, [R] Chpts. 5.3-5.7 slides23 (.pdf), script23 (.R)
24 18.05 16:00-18:00 Teams Testing correlation/independence. Multiple-sample tests of the mean.rec24 audio-video (.mp4) [R] Chpts. 7, 8 slides24 (.pdf), script24 (.R)
19.05 9:00-11:00 Teams Office hours and project tutoring.
mds/smd/start.txt · Ultima modifica: 04/11/2022 alle 12:18 (17 mesi fa) da Salvatore Ruggieri