Strumenti Utente

Strumenti Sito


mds:lbi:start

Questa è una vecchia versione del documento!


<html> <!– Google Analytics –> <script type=“text/javascript” charset=“utf-8”> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); ga('personalTracker.require', 'linker'); ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.send', 'pageview', 'ruggieri/teaching/lbi/'); setTimeout(“ga('send','event','adjusted bounce rate','30 seconds')”,30000); </script> <!– End Google Analytics –> <!– Global site tag (gtag.js) - Google Analytics –> <script async src=“https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W”></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-LPWY0VLB5W'); </script> <!– Capture clicks –> <script> jQuery(document).ready(function(){ jQuery('a[href$=“.pdf”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'PDS', 'PDFs', fname); }); jQuery('a[href$=“.r”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Rs', fname); }); jQuery('a[href$=“.zip”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'ZIPs', fname); }); jQuery('a[href$=“.mp4”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Videos', fname); }); jQuery('a[href$=“.flv”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Videos', fname); }); }); </script> </html> ====== LABORATORY OF DATA SCIENCE (2021/2022) ====== Instructors: * Anna Monreale * KDD Laboratory, Università di Pisa * http://pages.di.unipi.it/amonreale/ * anna [dot] monreale [at] unipi [dot] it * Office hours: Wednesday: 11:00-13:00 online using Teams (Appointment by email). * Telephone +39-050-2213119 * Roberto Pellungrini * KDD Laboratory, Università di Pisa * roberto [dot] pellungrini [at] di [dot] unipi [dot] it * Office hours: Thursday 14:00-16:00, Online using Teams (Appointment by email). * Telephone +39-050-2212728 ====== News ===== * [16-11-2021]: Instructions for the SSAS project in the Lecture of today: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as <your account>_foodmart (2) from project properties select 'Deployment', then rename the database as <your account>_foodmart; (3) click on the button “show all files” just above “Solution explorer” right click on “view code” on the .database file that is visualized, and then change the ID from ruggieri_foodmart into <your account>_foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you mayimport the project from the SSAS server and rename it as <your account>_foodmart (step 4 is still necessary). * [15/10/2021] Instructions for installing Data Tools for Visual Studio 2019 are in the software section of the wiki. Please follow them closely, step by step. * [15/10/2021] IMPORTANT The first part of the project is available. Checkpoint: 15 November. * [02/10/2021] The lecture of Monday 4th October will be canceled. * [08/09/2021] The first lecture will be on 16 Sept. * [08/09/2021] You can join the class by using this link: https://teams.microsoft.com/l/team/19%3amm3HFMqMSvpUrGY2sMYlpzxQ-atdxhfXreRUHhvrODs1%40thread.tacv2/conversations?groupId=c196ac40-93a2-4436-adfe-a81af3d06eef&tenantId=c7456b31-a220-47f5-be52-473828670aa1 * [16/09/2021] IMPORTANT Please, fill the document at the following link with your information, so that we can provide you access to teaching database and mailing list: https://docs.google.com/spreadsheets/d/1yYzHXmykhbfwy7G9uB_Z1fGcW_Vtvjugy4Yvlj-aM2Y/edit?usp=sharing ====== Hours and Rooms ====== Classes Lessons will be held onilne by Teams Platform ^ Day of Week ^ Hour ^ Room ^ | Monday | 11:00 - 12:45 | Teams | | Thursday | 09:00 - 10:45 | Teams | Link to Teams module: https://teams.microsoft.com/l/team/19%3amm3HFMqMSvpUrGY2sMYlpzxQ-atdxhfXreRUHhvrODs1%40thread.tacv2/conversations?groupId=c196ac40-93a2-4436-adfe-a81af3d06eef&tenantId=c7456b31-a220-47f5-be52-473828670aa1 ====== Learning Material ====== ===== Slides & Registration of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Registration of each lecture will be available on Teams ===== Past Exams ===== * 2016/17 text, 2015/16 text and 2015/16 solution, 2014/15 text and 2014/2015 solution, 2013/14 text, 2012/13 text and 2012/13 solution. ===== Software===== * Anaconda with Python 3.7 (Please, avoid Python 3.8) * SQL Server 2019 Developer Edition:SQL Server 2019 Management Studio. * Data Tools for Visual Studio 2019: instructions here Italian: Data Tools Visual Studio 2019 IT English: Data Tools Visual Studio 2019 EN * Microsoft Excel * Power BI Desktop ===== F.A.Q. ===== * Connection to wi-fi * F.A.Q.s about the labs * Unipi VPN * Unipi Authentication to access the VPN, make sure that network access services are enabled on you profile. Follow this link to access your Unipi profile. ====== Class calendar - (2021-2022) ====== ^ ^ Day ^ Topic ^ Slides ^ Data/Software ^ References ^ Video Lectures ^ Teacher | | | 13.09 11:00-12:45| Lecture canceled | | | | | |1. | 16.09 09:00-10:45| Introduction. File data access. | 2021-lds.01.introduction.pdf 2020-lds.02.bi_architectures.pptx.pdf 2020-lds.03.file_data_access.pptx.pdf| | - BI technology: An Overview of Business Intelligence Technology - File access: File System Interface |Video1 Video2 | Monreale | |2. | 20.09 11:00-12:45| Representation formats: CSV, FLV, ARFF, XML. Python Recap | 2020-lds.04.python.pptx.pdf| | - File Formats: Introduction to data technologies(Chps. 5, 6), Weka ARFF Format, XRFF Format - Python reference: Free python book with exercises |Video1 Video2 | Pellungrini | |3. | 23.09 11:00-12:45 | File Access in Python | lds.05.fileaccess-python2021.pdf | census.csv.zip Collection of files Partial Solutions to Python Exercises | | Video1 Video2| Pellungrini | |4. | 27.09 9:00-10:45 | File Access in Python Practice | lds.05.fileaccess-python2021.pdf | census.csv.zip Collection of files Partial Solutions to Python Exercises csv to Arff conversion solution| | Video | Pellungrini | |5. | 30.09 9:00-10:45 | Python Exercises | ex-customers.pdf | ex-customers_solution.zip data-customers.zip lds.file.format.zip| | Video1 Video2 Video3 |Pellungrini| | | 04.10 11:00-12:45 | Lecture canceled | | | | | |6. | 07.10 9:00-10:45 |RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | lds.06.relational_data_access-2021.pdf | | | | Monreale| |7. | 11.10 11:00-12:45 |RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | lds.06.relational_data_access-2021.pdf | 2021-code-db-samples.zip| | | Monreale| |8. | 14.10 9:00-10:45 |Stratified sampling | lds.07.sqlserver.pdf | stratifiedsampling.zip| |Video | Pellungrini| |9. | 18.10 12:00-12:45 | ETL Introduction | lds.08.etlandssis.pdf | | |Video | Monreale| |10. | 21.10 9:00-10:45 | SSIS: toCSV, FromCSV | | 2021-lds-etl-project.zip | | }}strazione%20della%20riunione.mp4|Video| Monreale | |11. | 25.10 11:00-12:45 |SSIS exercises: Pipeline, Update | exercisefact_table.pdf | | | Video| Monreale | |12. | 28.10 9:00-10:45 | SSIS exercises: Stratified Subsampling | ex-midterm.pdf | | | | Monreale | |13. | 04.11 9:00-10:45 | Project Support & Discussion | | | | | Monreale | |14. | 08.11 11:00-12:45 | SSIS: Surrogate keys + Slowly changing dimensions | | | | | Monreale | |15. | 11.11 9:00-10:45 | SSIS: Slowly changing dimensions + Datawarehousing and OLAP recap. | lds.09.dwandolap.pdf | 2021-lds-etl-project_full.zip | | | Monreale | |16. | 15.11 11:00-12:45 | OLAP with SQL Server Analysis Services (SSAS): data source views, dimensions, hierarchies. Data cubes, Parent-child hierarchies. | lds.09.ssas-21.pdf | foodmart_monreale_full-cube.zip | | | Monreale | |17. | 18.11 11:00-12:45 | Cube deployment. Measure setup, Calculated Members, Excel power pivot integration. ROLAP, MOLAP, HOLAP definition and setup. Cache management.| | foodmartexplorative.xlsx foodmart_monreale_complete.zip| | | Monreale | |18. | 22.11 11:00-12:45 | Introduction to MDX | | 2021-mdxquery-demo-partial.mdx.zip| Since the video of this lecture has some issue I'm linking the Video of the last year. It is not exactly equal but very similar. The videos are two because the lectures of these year could not be completely aligned. Video1 Video2| Monreale | |19. | 25.11 09:00-11:00 | Practice on MDX | | 2021-mdxquery-demo.mdx.zip | | Monreale| ====== Exams ====== PROJECT A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting. The project has to be performed by a team of 2 students (at most 3 after asking authorization for that to the teachers). Project to be delivered within 31 December 2021 * First part of the project consists in the assignments described here: lds_project_2021_part_1.pdf * A note about the first part of the project: for the 'language' attribute in the geography table, you should search for the necessary information elsewhere. Some examples are: http://download.geonames.org/export/dump/countryInfo.txt and http://www.fullstacks.io/2016/07/countries-and-their-spoken-languages.html * Second part of the project consist in the assignments described here:lds_project_2021_part_2.pdf * Third part of the project consist in the assignments described here: * Remember to re-submit all three parts of the project with your third part, as specified in the document above. * Dataset: data2021.zip * Deadline: First deadline - 15 Nov 2021 22 Nov 2021 Project to be delivered during the exam sessions Students who did not deliver the above project within 31 December 2021 need to ask by email a new project to the teachers. The project that will be assigned will require about 2 weeks of work and after the delivery it will be discussed during the oral exam. For those students, the oral exams will also cover some practical parts that could not be included in the project. ===== Exam sessions ===== ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ =====Past Editions ===== * LABORATORY OF DATA SCIENCE (2020/2021) * LABORATORY OF DATA SCIENCE (2019/2020) * LABORATORY OF DATA SCIENCE (2018/2019) * BUSINESS INTELLIGENCE LAB (2017/2018)

mds/lbi/start.1637915590.txt.gz · Ultima modifica: 26/11/2021 alle 08:33 (3 anni fa) da Anna Monreale

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki