Strumenti Utente

Strumenti Sito


mds:lbi:start

Questa è una vecchia versione del documento!


<html> <!– Google Analytics –> <script type=“text/javascript” charset=“utf-8”> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); ga('personalTracker.require', 'linker'); ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.send', 'pageview', 'ruggieri/teaching/lbi/'); setTimeout(“ga('send','event','adjusted bounce rate','30 seconds')”,30000); </script> <!– End Google Analytics –> <!– Global site tag (gtag.js) - Google Analytics –> <script async src=“https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W”></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-LPWY0VLB5W'); </script> <!– Capture clicks –> <script> jQuery(document).ready(function(){ jQuery('a[href$=“.pdf”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'PDS', 'PDFs', fname); }); jQuery('a[href$=“.r”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Rs', fname); }); jQuery('a[href$=“.zip”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'ZIPs', fname); }); jQuery('a[href$=“.mp4”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Videos', fname); }); jQuery('a[href$=“.flv”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Videos', fname); }); }); </script> </html> ====== LABORATORY OF DATA SCIENCE (2021/2022) ====== Instructors: * Anna Monreale * KDD Laboratory, Università di Pisa * http://pages.di.unipi.it/amonreale/ * anna [dot] monreale [at] unipi [dot] it * Office hours: Wednesday: 11:00-13:00 online using Teams (Appointment by email). * Telephone +39-050-2213119 * Roberto Pellungrini * KDD Laboratory, Università di Pisa * roberto [dot] pellungrini [at] di [dot] unipi [dot] it * Office hours: Thursday 14:00-16:00, Online using Teams (Appointment by email). * Telephone +39-050-2212728 ====== News ===== * [15/10/2021] Instructions for installing Data Tools for Visual Studio 2019 are in the software section of the wiki. Please follow them closely, step by step. * [15/10/2021] IMPORTANT The first part of the project is available. Checkpoint: 15 November. * [02/10/2021] The lecture of Monday 4th October will be canceled. * [08/09/2021] The first lecture will be on 16 Sept. * [08/09/2021] You can join the class by using this link: https://teams.microsoft.com/l/team/19%3amm3HFMqMSvpUrGY2sMYlpzxQ-atdxhfXreRUHhvrODs1%40thread.tacv2/conversations?groupId=c196ac40-93a2-4436-adfe-a81af3d06eef&tenantId=c7456b31-a220-47f5-be52-473828670aa1 * [16/09/2021] IMPORTANT Please, fill the document at the following link with your information, so that we can provide you access to teaching database and mailing list: https://docs.google.com/spreadsheets/d/1yYzHXmykhbfwy7G9uB_Z1fGcW_Vtvjugy4Yvlj-aM2Y/edit?usp=sharing ====== Hours and Rooms ====== Classes Lessons will be held onilne by Teams Platform ^ Day of Week ^ Hour ^ Room ^ | Monday | 11:00 - 12:45 | Teams | | Thursday | 09:00 - 10:45 | Teams | Link to Teams module: https://teams.microsoft.com/l/team/19%3amm3HFMqMSvpUrGY2sMYlpzxQ-atdxhfXreRUHhvrODs1%40thread.tacv2/conversations?groupId=c196ac40-93a2-4436-adfe-a81af3d06eef&tenantId=c7456b31-a220-47f5-be52-473828670aa1 ====== Learning Material ====== ===== Slides & Registration of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Registration of each lecture will be available on Teams ===== Past Exams ===== * 2016/17 text, 2015/16 text and 2015/16 solution, 2014/15 text and 2014/2015 solution, 2013/14 text, 2012/13 text and 2012/13 solution. ===== Software===== * Anaconda with Python 3.7 (Please, avoid Python 3.8) * SQL Server 2019 Developer Edition:SQL Server 2019 Management Studio. * Data Tools for Visual Studio 2019: instructions here Italian: Data Tools Visual Studio 2019 IT English: Data Tools Visual Studio 2019 EN * Microsoft Excel * Power BI Desktop ===== F.A.Q. ===== * Connection to wi-fi * F.A.Q.s about the labs * Unipi VPN * Unipi Authentication to access the VPN, make sure that network access services are enabled on you profile. Follow this link to access your Unipi profile. ====== Class calendar - (2021-2022) ====== ^ ^ Day ^ Topic ^ Slides ^ Data/Software ^ References ^ Video Lectures ^ Teacher | | | 13.09 11:00-12:45| Lecture canceled | | | | | |1. | 16.09 09:00-10:45| Introduction. File data access. | 2021-lds.01.introduction.pdf 2020-lds.02.bi_architectures.pptx.pdf 2020-lds.03.file_data_access.pptx.pdf| | - BI technology: An Overview of Business Intelligence Technology - File access: File System Interface |Video1 Video2 | Monreale | |2. | 20.09 11:00-12:45| Representation formats: CSV, FLV, ARFF, XML. Python Recap | 2020-lds.04.python.pptx.pdf| | - File Formats: Introduction to data technologies(Chps. 5, 6), Weka ARFF Format, XRFF Format - Python reference: Free python book with exercises |Video1 Video2 | Pellungrini | |3. | 23.09 11:00-12:45 | File Access in Python | lds.05.fileaccess-python2021.pdf | census.csv.zip Collection of files Partial Solutions to Python Exercises | | Video1 Video2| Pellungrini | |4. | 27.09 9:00-10:45 | File Access in Python Practice | lds.05.fileaccess-python2021.pdf | census.csv.zip Collection of files Partial Solutions to Python Exercises csv to Arff conversion solution| | Video | Pellungrini | |5. | 30.09 9:00-10:45 | Python Exercises | ex-customers.pdf | ex-customers_solution.zip data-customers.zip lds.file.format.zip| | Video1 Video2 Video3 |Pellungrini| | | 04.10 11:00-12:45 | Lecture canceled | | | | | |6. | 07.10 9:00-10:45 |RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | lds.06.relational_data_access-2021.pdf | | | | Monreale| |7. | 11.10 11:00-12:45 |RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | lds.06.relational_data_access-2021.pdf | 2021-code-db-samples.zip| | | Monreale| |8. | 14.10 9:00-10:45 |Stratified sampling | lds.07.sqlserver.pdf | stratifiedsampling.zip| |Video | Pellungrini| |9. | 18.10 12:00-12:45 | ETL Introduction | lds.08.etlandssis.pdf | | |Video | Monreale| |10. | 21.10 9:00-10:45 | SSIS: toCSV, FromCSV | | 2021-lds-etl-project.zip | | Video| Monreale | |11. | 25.10 11:00-12:45 |SSIS exercises: Pipeline, Update | exercisefact_table.pdf | | | Video| Monreale | |12. | 28.10 9:00-10:45 | SSIS exercises: Stratified Subsampling | ex-midterm.pdf | | | | Monreale | |13. | 04.11 9:00-10:45 | Project Support & Discussion | | | | | Monreale | |14. | 08.11 11:00-12:45 | SSIS: Surrogate keys + Slowly changing dimensions | | | | | Monreale | |15. | 11.11 9:00-10:45 | SSIS: Slowly changing dimensions | |2021-lds-etl-project_full.zip | | | Monreale | ====== Exams ====== PROJECT A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting. The project has to be performed by a team of 2 students (at most 3 after asking authorization for that to the teachers). Project to be delivered within 31 December 2021 * First part of the project consists in the assignments described here: lds_project_2021_part_1.pdf * A note about the first part of the project: for the 'language' attribute in the geography table, you should search for the necessary information elsewhere. Some examples are: http://download.geonames.org/export/dump/countryInfo.txt and http://www.fullstacks.io/2016/07/countries-and-their-spoken-languages.html * Second part of the project consist in the assignments described here:lds_project_2021_part_2.pdf * Third part of the project consist in the assignments described here: * Remember to re-submit all three parts of the project with your third part, as specified in the document above. * Dataset: data2021.zip * Deadline: First deadline - 15 Nov 2021 22 Nov 2021 Project to be delivered during the exam sessions Students who did not deliver the above project within 31 December 2021 need to ask by email a new project to the teachers. The project that will be assigned will require about 2 weeks of work and after the delivery it will be discussed during the oral exam. For those students, the oral exams will also cover some practical parts that could not be included in the project. ===== Exam sessions ===== ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ =====Past Editions ===== * LABORATORY OF DATA SCIENCE (2020/2021) * LABORATORY OF DATA SCIENCE (2019/2020) * LABORATORY OF DATA SCIENCE (2018/2019) * BUSINESS INTELLIGENCE LAB (2017/2018)

mds/lbi/start.1636680555.txt.gz · Ultima modifica: 12/11/2021 alle 01:29 (3 anni fa) da Anna Monreale

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki