Strumenti Utente

Strumenti Sito


mds:smd:start

Statistical Methods for Data Science A.Y. 2020/21

Instructor

Classes

Dates are preliminary.

Day of Week Hour Room
Tuesday 16:00 - 18:00 Teams Virtual Room
Wednesday 9:00 - 11:00 Teams Virtual Room

Pre-requisites

Students should be comfortable with most of the topics on mathematical calculus covered in:

  • [P] J. Ward, J. Abdey. Mathematics and Statistics. University of London, 2013. Chapters 1-8 of Part 1.

Extra-lessons refreshing such notions may be planned in the first part of the course.

Mandatory Teaching Material

The following are mandatory text books:

  • [T] F.M. Dekking C. Kraaikamp, H.P. Lopuha, L.E. Meester. A Modern Introduction to Probability and Statistics. Springer, 2005.
  • [R] P. Dalgaard. Introductory Statistics with R. 2nd edition, Springer, 2008.

Software

Preliminary program and calendar

Student project

  • The project can be done in groups of at most 3 students.
  • The project must be delivered (report + code) by end of July.
  • The oral discussion must be done by the September session, and it will cover both the project and all topics of the course.
  • The project replaces the written exam but students have to register for the written dates in order to fill the student's questionnaire.
  • Groups ready to discuss send the project to the teacher plus availability time slots for oral discussion.
  • Topic of the project and details will be available around mid-March 2021.

Written exam

There are no mid-terms. The exam consists of a written part and an oral part. The written part consists of exercises on the topics of the course. Each question is assigned a grade, summing up to 30 points. Students are admitted to the oral part if they receive a grade of at least 18 points. Written exam consists of open questions and exercises. Example written texts: sample1, sample2. Oral consists of critical discussion of the written part and of open questions and problem solving on the topics of the course.
Online exams: during the COVID-19 restrictions, the written part and the oral part will be online. For the written part, students will connect to a reserved Teams virtual room and will activate both microphone and web-cam. The text will be shared in the virtual room chat. Solutions will be written on sheet of papers. Each sheet will include name, surname, student id, and it will be signed. A photo of the sheets will be delivered to ruggieri [at] di [dot] unipi [dot] it at the end of the written part.

Registration to exams is mandatory (beware of the registration deadline!): register here

Date Hour Room Notes
Online exam

Class calendar

Date Room Topic Learning material
01 16.02 16:00-18:00 Teams Introduction. Probability and independence. rec01 audio-video (.mp4) [T] Chpts. 1-3 slides01 (.pdf)
XX 17.02 9:00-11:00 Teams No lesson on this date
02 23.02 16:00-18:00 Teams R basics. rec02 audio-video (.mp4) [R] Chpts. 1,2.1,2.2 slides02 (.pdf), script02 (.R)
03 24.02 9:00-11:00 Teams Discrete random variables. rec03 audio-video (.mp4) [T] Chpt. 4 [R] Chpt. 3 slides03 (.pdf), script03 (.R)
04 02.03 16:00-18:00 Teams Continuous random variables. Simulation.
05 03.03 9:00-11:00 Teams

Last year class calendar

Date Room Topic Learning material
4 04.03 9:00-11:00 A1 Continuous random variables. Simulation. [T] Chpts. 5, 6.1-6.2 [R] Chpt. 3 script3.R
5 10.03 16:00-18:00 Distance-learning Recalls: derivatives and integrals. rec01 audio-video (.flv) [P] Chpt. 1-8 scriptMath.R
6 11.03 9:00-11:00 Distance-learning Expectation and variance. R data access. rec02 audio-video (.flv) [T] Chpt. 7 [R] Chpt. 2.4 script4.R
7 17.03 16:00-18:00 Distance-learning R programming. Project presentation. rec03 audio-video (.flv) and project info audio-video (.flv) [R] Chpt. 2.3 exercise.R script5.zip
8 18.03 9:00-11:00 Distance-learning Project presentation. Power laws and Zipf laws. rec04 audio-video (.flv) Newman's paper Sect I, II, III(A,B,E,F) script6.R
9 24.03 16:00-18:00 Distance-learning Computations with random variables. Joint distributions. rec05 audio-video (.flv) [T] Chpts. 8-9 script7.zip
10 25.03 9:00-11:00 Distance-learning Covariance. Sum of random variables. rec06 audio-video (.flv) [T] Chpts. 10-11 script8.R
11 31.03 16:00-18:00 Distance-learning Law of large numbers. The central limit theorem. rec07 audio-video (.flv) [T] Chpts. 13-14 script9.R
12 1.04 9:00-11:00 Distance-learning Graphical summaries. rec08 audio-video (.flv) [T] Chpt. 15 script10.R
13 7.04 16:00-18:00 Distance-learning Numerical summaries. Data preprocessing in R. Q&A on the project. rec09 audio-video (.flv), project data audio-video (.flv) [T] Chpt. 16, [R] Chpts. 4,10 script11.R, dataprep.R
14 8.04 9:00-11:00 Distance-learning Unbiased estimators. Efficiency and MSE. rec10 audio-video (.flv) [T] Chpts. 17.1-17.3, 19, 20 script12.R
XX 15.04 9:00-11:00 No lesson on this date. Students work on the project on their own.
15 21.04 16:00-18:00 Distance-learning Maximum likelihood. Fisher information.rec11 audio-video (.flv) [T] Chpt. 21 notes1.pdf script13.R
16 22.04 9:00-11:00 Distance-learning Simple linear and polynomial regression. Least squares. rec12 audio-video (.flv) [T] Chpts. 17.4,22 [R] Chpts. 6,12.1 script14.R
17 28.04 16:00-18:00 Distance-learning Multiple, non-linear, and logistic regression. rec13 audio-video (.flv) [R] Chpt. 13,16.1-16.2 notes2.pdf script15.R
18 29.04 9:00-11:00 Distance-learning Confidence intervals: Gaussian, T-student, large sample method. rec14 audio-video (.flv) [T] Chpts. 23.1,23.2,23.4, 24.3,24.4 script16.R
19 05.05 16:00-18:00 Distance-learning Confidence intervals in linear regression. Empirical bootstrap. Application to confidence intervals. rec15 audio-video (.flv) [T] Chpts. 18.1,18.2,23.3 notes2.pdf script17.R
20 06.05 9:00-11:00 Distance-learning Parametric bootstrap. Hypotheses testing. rec16 audio-video (.flv) [T] Chpts. 18.3,25 script18.R
21 12.05 16:00-18:00 Distance-learning One-sample t-test and application to linear regression. rec17 audio-video (.flv) [T] Chpts. 26-27, [R] Chpts. 5.1,5.2 notes2.pdf script19.R
22 13.05 9:00-11:00 Distance-learning Goodness of fit: chi-square, K-S. Fitting power laws. rec18 audio-video (.flv) K-S script20.R
XX 19.05 16:00-18:00 No lesson on this date. Students work on the project on their own.
23 20.05 9:00-11:00 Distance-learning Hypotheses testing: F-test, comparing two samples. rec19 audio-video (.flv) [T] Chpts. 28, [R] Chpts. 5.3-5.7 script21.R
XX 26.05 16:00-18:00 No lesson on this date. Students work on the project on their own.
24 27.05 9:00-11:00 Distance-learning Project tutoring. rec20 audio-video (.flv)

Previous years

mds/smd/start.txt · Ultima modifica: 24/02/2021 alle 15:45 (4 giorni fa) da Salvatore Ruggieri