Strumenti Utente

Strumenti Sito


mds:sds:start

Questa è una vecchia versione del documento!


<html> <!– Google Analytics –> <script type=“text/javascript” charset=“utf-8”> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); ga('personalTracker.require', 'linker'); ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it', 'luciacpassaro.github.io'] ); ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.send', 'pageview', 'courses/sds/'); setTimeout(“ga('send','event','adjusted bounce rate','30 seconds')”,30000); </script> <!– End Google Analytics –> <!– Global site tag (gtag.js) - Google Analytics –> <script async src=“https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W”></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-LPWY0VLB5W'); </script> <!– Capture clicks –> <script> jQuery(document).ready(function(){ jQuery('a[href$=“.pdf”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'SDS', 'PDFs', fname); }); jQuery('a[href$=“.r”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'SDS', 'Rs', fname); }); jQuery('a[href$=“.zip”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'SDS', 'ZIPs', fname); }); }); </script> </html> ====== Statistics for Data Science (628PP) A.Y. 2023/24 ====== =====Instructor===== * Salvatore Ruggieri * Università di Pisa * http://pages.di.unipi.it/ruggieri/ * salvatore [dot] ruggieri [at] unipi [dot] it * Office hours: Tuesdays h 14:00 - 16:00 or by appointment, at the Department of Computer Science, room 321/DO, or via Teams. =====Hours and rooms===== ^ Day of Week ^ Hour ^ Room ^ | Tuesday | 16:00 - 18:00 | Fib-C | | Thursday | 14:00 - 16:00 | Fib-C | | Friday | 11:00 - 13:00 | Fib-C | A Teams channel is used to post news, notes, Q&A, and other stuff related to the course. The lectures will be only in presence and will NOT be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students. =====Pre-requisites===== Students should be comfortable with most of the topics on mathematical calculus covered in: * [P] J. Ward, J. Abdey. Mathematics and Statistics. University of London, 2013. Chapters 1-8 of Part 1. Extra-lessons refreshing such notions may be planned in the first part of the course. =====Mandatory Teaching Material===== The following are mandatory text books: * [T] F.M. Dekking C. Kraaikamp, H.P. Lopuha, L.E. Meester. A Modern Introduction to Probability and Statistics. Springer, 2005. * [R] P. Dalgaard. Introductory Statistics with R. 2nd edition, Springer, 2008. * selected chapters of other books for advanced topics =====Software===== * R * R Studio =====Preliminary program and calendar===== * Preliminary program. * Calendar of lessons. =====Exams===== There are no mid-terms. The exam consists of a written part and an oral part. The written part consists of exercises and questions on the topics of the course. Each question is assigned a grade, summing up to 30 points. Example written texts: sample1, sample2. Students are admitted to the oral part if they receive a grade of at least 18 points. The oral part consists of critical discussion of the written part and of open questions and problem solving on the topics (both theory and R programming) of the course. In particular, students must demonstrate to be able to summarize both the theory and the software related to any of the lessons using the slides and R scripts of the lessons. Registration to exams is mandatory (beware of the registration deadline!): register here. The dates below are only for the written test (normal exam). Dates for project discussion are included in the project description.
^ Date ^ Hour ^ Room ^ Notes ^ | 28/5/2024 | 11:00 - 13:00 | FIB-A1 | | | 25/6/2024 | 11:00 - 13:00 | FIB-A1 | | | 23/7/2024 | 11:00 - 13:00 | FIB-A1 | | | 13/9/2024 | 9:00 - 11:00 | TBD | | <html> <!– Extra-ordinary exam –> </html> =====Student project===== * The project replaces the written part of the examination * Project description and rules and Q&A. * Recording of project description (.mp4) =====Class calendar===== Lessons will be NOT be live-streamed, but recordings of past years are available here for non-attending students.
To watch the recordings online, you must be connected to the unipi.it VPN. Alternatively, right click on the link and download the whole file, then watch it locally on your device using e.g. VLC media player. Slides and R scripts might be updated after the classes to align with actual content of lessons and to correct typos. Be sure to download the updated versions. ^ # ^ Date ^ Room ^ Topic ^ Mandatory teaching material ^ |01| 20/02 16-18| Fib-C | Introduction. Probability and independence. rec01 (.mp4) | [T] Chpts. 1-3 slides01 (.pdf)| |02| 22/02 14-16| Fib-C | R basics. rec02 (.mp4) | [R] Chpts. 1,2.1-2.3 slides02 (.pdf), script02 (.R)| |03| 23/02 11-13| Fib-C | Bayes' rule and applications. rec03 (.mp4) | [T] Chpt. 3 slides03 (.pdf), script03 (.R)| |04| 27/02 16-18 | Fib-C | Discrete random variables. rec04 (.mp4) | [T] Chpts. 4, 9.1, 9.2, 9.4 [R] Chpt. 3 slides04 (.pdf), script04 (.R)| |05| 29/02 14-16 | Fib-C | Discrete random variables (continued). rec05 (.mp4) | | |06| 01/03 11-13 | Fib-C | Recalls: derivatives and integrals. rec06 (.mp4) | [P] Chpt. 1-8 slides06 (.pdf), script06 (.R)| |07| 05/03 16-18 | Fib-C | R data access and programming. rec07 (.mp4) | [R] Chpt. 2.3,2.4 script07 (.zip) | |08| 07/03 14-16 | Fib-C | Continuous random variables.rec08 (.mp4) | [T] Chpts. 5, 9.2-9.4 [R] Chpt. 3 slides08 (.pdf), script08 (.R)| |09| 08/03 11-13 | Fib-C | Expectation and variance. Computations with random variables.rec09 (.mp4) | [T] Chpts. 7,8 slides09 (.pdf), script09 (.R)| |10| 12/03 16-18| Fib-C | Expectation and variance. Computations with random variables (continued). Moments. Functions of random variables. rec10 (.mp4) | [T] Chpts. 9-11 slides10 (.pdf), script10 (.zip) | |11| 14/03 14-16| Fib-C | Functions of random variables (continued). Distances between distributions. rec11 (.mp4) | Murphy's book Chpt. 6 slides11 (.pdf), script11 (.R) | |12| 15/03 11-13 | Fib-C | Simulation. rec12 (.mp4) | [T] Chpts. 6.1-6.2 slides12 (.pdf), script12 (.R) script12_sol07 (.R)| |13| 19/03 16-18 | Fib-C | Power laws and Zipf's law. rec13 (.mp4) | Newman's paper Sect I, II, III(A,B,E,F) slides13 (.pdf), script13 (.R)| |14| 21/03 14-16| Fib-C | Law of large numbers. The central limit theorem. rec14 (.mp4) | [T] Chpts. 13-14 slides14 (.pdf), script14 (.R) | |15| 22/03 11-13 | Fib-C | Graphical summaries. Kernel Density Estimation. rec15 (.mp4) | [T] Chpt. 15, [R] Chpt. 4 slides15 (.pdf), script15 (.R)| |16| 26/03 16-18| Fib-C | Numerical summaries.rec16 (.mp4) | [T] Chpt. 16, [R] Chpt. 4 slides16 (.pdf), script16 (.R) | |17| 28/03 14-16 | Fib-C |Data preprocessing in R. Estimators.rec17 (.mp4) | [R] Chpt. 10, [T] Chpts. 17.1-17.3script17 (.R), dataprep.R | |18| 04/04 14-16 | Fib-C | Unbiased estimators. Efficiency and MSE.rec18 (.mp4) | [T] Chpts. 19, 20 slides18 (.pdf), script18 (.R) | |19| 05/04 11-13 | Fib-C | Maximum likelihood estimation.rec19 (.mp4) | [T] Chpt. 21 s4dsln.pdf Chpt. 1 slides19 (.pdf), script19 (.R) | |20| 09/04 16-18 | Fib-C | Linear regression. Least squares estimation.rec20 (.mp4) | [T] Chpts. 17.4,22 [R] Chpt. 6 s4dsln.pdf Chpt. 2 slides20 (.pdf), script20 (.R) | |21| 11/04 14-16 | Fib-C | Non-linear, and multiple linear regression.rec21 (.mp4) | [R] Chpt. 12.1,13,16.1-16.2 s4dsln.pdf Chpt. 2 slides21 (.pdf), script21 (.R) | |22| 12/04 11-13 | Fib-C | Issues with linear regression. Logistic regression.rec22 (.mp4) | [R] Chpt. 12.1,13,16.1-16.2 slides22 (.pdf), script22 (.zip) | |23| 16/04 16-18 | Fib-C | Statistical decision theory.rec23 (.mp4) | s4dsln.pdf Chpt. 4 slides23 (.pdf), script23 (.R) | |24| 18/04 14-16 | Fib-C | Statistical decision theory (continued).rec24 (.mp4) | | |25| 19/04 11-13 | Fib-C | Statistical decision theory (continued). Project presentation. | | |26| 23/04 16-18| Fib-C | Confidence intervals: mean, proportion, linear regression.rec26 (.mp4) | [T] Chpts. 23.1,23.2,23.4,24.3,24.4 s4dsln.pdf Chpt. 3 slides26 (.pdf), script26 (.R) | |–| 26/04 11-13 | — | No lesson on this date | | |27| 30/04 16-18| Fib-C | Confidence intervals (continued). Bootstrap and resampling methods.rec27 (.mp4) | [T] Chpts. 18.1-18.3,23.3 slides27 (.pdf), script27 (.R) | |28| 02/05 14-16| Fib-C | Bootstrap and resampling methods (continued).rec28 (.mp4) | | |29| 03/05 11-13| Fib-C | Hypotheses testing. One-sample tests of the mean and application to linear regression.rec29 (.mp4) | [T] Chpts. 25,26,27, [R] Chpts. 5.1,5.2 s4dsln.pdf Chpt.3.3 slides29 (.pdf), script29 (.R) | |s03| 07/05 16-18| Fib-C | Mandatory seminar: Introduction to causal modeling and reasoning. Speakers: I. Beretta and M. Cinquini. rec_s03 (.mp4) | slides_s03 (.pdf)| |30| 09/05 14-16| Fib-C | One-sample tests of the mean and application to linear regression (continued). Classifier performance metrics in R. rec30 (.mp4) | slides30 (.pdf), script30 (.R) | |31| 10/05 11-13| Fib-C | Two-sample tests of the mean and applications to classifier comparison. rec31 (.mp4) | [T] Chpts. 28, [R] Chpts. 5.3-5.7 slides31 (.pdf), script31 (.R) | |32| 14/05 16-18| Fib-C | Multiple-sample tests of the mean and applications to classifier comparison.rec33 (.mp4) | [R] Chpt. 7 slides32 (.pdf), script33 (.R) | |33| 16/05 14-16| Fib-C | Fitting distributions. Testing independence/association.rec34 (.mp4) | [R] Chpt. 8 K-S, slides33 (.pdf), script33 (.R) | |34| 17/05 11-13| Fib-C | Fitting distributions. Testing independence/association (continued). Project Q&A. | | |35| 21/05 16-18| Fib-C | Project Q&A. | | =====Past years===== * Statistics for Data Science A.Y. 2022/23 Moreover, this course of 9 ECTS replaces an older 6 ECTS version: Statistical Methods for Data Science A.Y. 2020/21 (500PP). The 6 ECTS version is discontinued. Students having the 6 ECTS version in their study plan can still take the 6 ECTS version exam for the A.Y. 2021/22, 2022/23 and 2023/24. However, there will no specific project for the 6 ECTS version.

mds/sds/start.1715340631.txt.gz · Ultima modifica: 10/05/2024 alle 11:30 (5 mesi fa) da Salvatore Ruggieri

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki