====== LABORATORY OF DATA SCIENCE (2020/2021) ====== **Instructors**: * **Anna Monreale** * KDD Laboratory, Università di Pisa * [[http://pages.di.unipi.it/amonreale/]] * [[anna.monreale@unipi.it]] * Office hours: Wednesday: 11:00-13:00 online using Teams (Appointment by email). * Telephone +39-050-2213119 * **Roberto Pellungrini** * KDD Laboratory, Università di Pisa * [[roberto.pellungrini@di.unipi.it]] * Office hours: Thursday 14:00-16:00, Online using Teams (Appointment by email). * Telephone +39-050-2212728 ====== News ===== * [10/11/2020] Instructions for the SSAS project in the Lecture of today: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as _foodmart (2) from project properties select 'Deployment', then rename the database as _foodmart; (3) click on the button “show all files” just above “Solution explorer” right click on “view code” on the .database file that is visualized, and then change the ID from ruggieri_foodmart into _foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you mayimport the project from the SSAS server and rename it as _foodmart (step 4 is still necessary). * [13/09/2020]: The lecture will be online. You can join the class by using this link: https://teams.microsoft.com/l/team/19%3ad3bfc8ae4ed04ea99ccd7857a43e101e%40thread.tacv2/conversations?groupId=eb724cb5-3ac7-4b8e-9280-05e191cf8477&tenantId=c7456b31-a220-47f5-be52-473828670aa1 ====== Hours and Rooms ====== **Classes ** Lessons will be held onilne by Teams Platform ^ Day of Week ^ Hour ^ Room ^ | Tuesday | 11:00 - 12:45 | Teams | | Thursday | 11:00 - 12:45 | Teams | **Link to Teams module:** https://teams.microsoft.com/l/team/19%3ad3bfc8ae4ed04ea99ccd7857a43e101e%40thread.tacv2/conversations?groupId=eb724cb5-3ac7-4b8e-9280-05e191cf8477&tenantId=c7456b31-a220-47f5-be52-473828670aa1 ====== Learning Material ====== ===== Slides & Registration of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Registration of each lecture will be available on Teams ===== Past Exams ===== * {{ :mds:lbi:2016midterm1text.pdf |2016/17 text}}, {{ :mds:lbi:2015fallmidterm1text.pdf | 2015/16 text}} and {{ :mds:lbi:2015wintermidterm1.zip | 2015/16 solution}}, {{:mds:lbi:2015midterm1text.pdf | 2014/15 text}} and {{ :mds:lbi:2015midterm1.zip |2014/2015 solution}}, {{ :mds:lbi:2014midterm1text.pdf | 2013/14 text}},{{ :mds:lbi:2013midterm1.pdf | 2012/13 text }} and {{ :mds:lbi:2013midterm1.zip |2012/13 solution}}. ===== Software===== * Anaconda with Python 3.7 (Please, avoid Python 3.8) * SQL Server 2019 Developer Edition:[[https://docs.microsoft.com/it-it/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-ver15|SQL Server 2019 Management Studio]]. * For Data Tools we will publish instruction soon. * Microsoft Excel * [[https://powerbi.microsoft.com/it-it/desktop/| Power BI Desktop]] ===== F.A.Q. ===== * [[http://www.sid.unipi.it/polo2/2015/03/26/connessione-alle-reti-wifi/ | Connection to wi-fi]] * [[http://www.sid.unipi.it/polo2/studenti/ | F.A.Q.s about the labs]] ====== Class calendar - (2020-2021) ====== ^ ^ Day ^ Topic ^ Slides ^ Data/Software ^ References ^ Teacher | |1. | 15.09 11:00-12:45| Introduction. File data access. Representation formats: CSV, FLV, ARFF, XML| {{ :mds:lbi:2020-lds.01.introduction.pptx.pdf |}} {{ :mds:lbi:2020-lds.02.bi_architectures.pptx.pdf |}} {{ :mds:lbi:2020-lds.03.file_data_access.pptx.pdf |}}| | -** BI technology:** [[https://cacm.acm.org/magazines/2011/8/114953-an-overview-of-business-intelligence-technology/fulltext | An Overview of Business Intelligence Technology]] - **File access:** {{ :mds:lbi:filesystem.pdf | File System Interface}} - **File Formats:** [[http://www.stat.auckland.ac.nz/~paul/ItDT | Introduction to data technologies(Chps. 5, 6)]], [[http://weka.wikispaces.com/ARFF+(stable+version)|Weka ARFF Format]], [[http://weka.wikispaces.com/XRFF|XRFF Format]] | Monreale | |2. | 17.09 11:00-13:00 | Python Recap | {{ :mds:lbi:2020-lds.04.python.pptx.pdf |}} | |Free Python book: http://www.spronck.net/pythonbook/ |Pellungrini| |3. | 22.09 11:00-13:00 | File Access in Python | {{ :mds:lbi:2020-lds.05.fileaccess-python.pptx.pdf |}} | {{ :mds:lbi:census.csv.zip |}} {{ :mds:lbi:csv2arrf.py.zip |}} {{ :mds:lbi:data1.zip |Collection of files}}| |Pellungrini| |4. | 24.09 11:00-13:00 | Lab practice: XML2CSV/CSV2JSON file format conversion | |{{ :mds:lbi:lds.file.format.zip |}} | |Pellungrini| |5. | 29.09 11:00-13:00 | Python Exercises | {{ :mds:lbi:ex-customers.pdf |}} | {{ :mds:lbi:data-customers.zip |}} {{ :mds:lbi:ex-customers_solution.zip |}}| |Pellungrini| |6. | 01.10 11:00-13:00 | RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | {{ :mds:lbi:lbi.06.relational_data_access-complete.pdf |}} | | | Monreale| |7. | 06.10 11:00-13:00 | RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | {{ :mds:lbi:lbi.06.relational_data_access-complete.pdf |}} | {{ :mds:lbi:code-db-samples.zip |}} | | Monreale| |8. | 08.10 11:00-13:00 | Stratified Sampling Ex, SQL Management Studio Demo, Project Explaination | {{ :mds:lbi:lds.07.sqlserver.pdf |}}|{{ :mds:lbi:stratifiedsampling.zip |}} | | Monreale, Pellungrini| |9. | 13.10 11:00-13:00 | ETL, Sequel Server Data Tools Demo | {{ :mds:lbi:lds.08.etlandssis.pdf |}} |{{ :mds:lbi:lds_first_project.zip |}} | |Pellungrini| |10. | 15.10 11:00-13:00 | SSIS exercises| | {{ :mds:lbi:ex-midterm.pdf |}}| |Pellungrini| |11. | 20.10 11:00-13:00 | stratified sampling with SSIS + SSIS practice | |{{ :mds:lbi:2015midterm1text.pdf |}}{{ :mds:lbi:lds_first_project.zip |}} | | Monreale/Pellungrini| |12. | 22.10 11:00-13:00 | SSIS practice + Project support | |{{:mds:lbi:2015midterm1text.pdf |}} | | Monreale/Pellungrini| |13. | 27.10 11:00-13:00 | SSIS: Surrogate Keys | | | | Monreale/Pellungrini| |14. | 29.10 11:00-13:00 | SSIS: slowly changing dimensions | |{{ :mds:lbi:2016ssis.zip |}} | | Monreale/Pellungrini| |15. | 03.11 11:00-13:00 | Datawarehousing and OLAP recap. Data cubes, analytic SQL, and materialized views in SQL Server. | {{ :mds:lbi:lds.09.dwandolap.pdf |}} | {{ :mds:lbi:lbi.08.afdemo.sql.zip |}} | | Monreale/Pellungrini| |16. | 05.11 11:00-13:00 | OLAP with SQL Server Analysis Services (SSAS): data source views, dimensions, | {{ :mds:lbi:lds.09.dwandolap.pdf |}} {{ :mds:lbi:lds.10.ssas.pdf |}} | |**1) SSAS (olap):** [[http://msdn.microsoft.com/en-us/library/bb522607.aspx|documentation]]; 2) S. Harinath et al. {{ :mds:lbi:ssas2012ch456.pdf |Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chps. 4-6}}. | Monreale/Pellungrini| |17. | 10.11 11:00-13:00 | OLAP with SQL Server Analysis Services (SSAS): dimensions, hierarchies. Data cubes, Parent-child hierarchies. OLAP explorative data analysis with Pivot Tables in Excel.| {{ :mds:lbi:lds.10.ssas.pdf |}} | {{ :mds:lbi:monreale_foodmart.zip |}} **Notice:** Please read the instructions in the Section NEWS! | **Pivot Tables in Excel:** G. Harvey. {{ :mds:lbi:pivottable2013bookviichpt2.pdf |Excel 2013 All-in-One For Dummies, 2013. Chp. VII-2}}. | Monreale/Pellungrini| |18. | 12.11 11:00-13:00 | OLAP explorative data analysis with Pivot Tables in Excel.| | {{ :mds:lbi:foodmartexplorative.xlsx |}} | | Monreale/Pellungrini| |19. | 17.11 11:00-13:00 | Introduction to MDX | | | **MDX:** 1) [[http://msdn.microsoft.com/en-us/library/bb500184.aspx|documentation]] and a [[https://www.mssqltips.com/sqlservertip/3129/order-and-sort-with-mdx-in-sql-server-analysis-services/|useful guide on ordering]]; 2) S. Harinath ed al. {{ :mds:lbi:ssas2012ch3.pdf |Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chp. 3.}} | Monreale/Pellungrini| |20. | 19.11 11:00-13:00 | Introduction to MDX | | {{ :mds:lbi:mdx-practice-2020.txt.zip |}} {{ :mds:lbi:mdx-ex.pdf |}}| | Monreale | |21. | 24.11 11:00-13:00 | Practice on MDX | | {{ :mds:lbi:lbi.09.mdxsample.mdx.zip |}} {{ :mds:lbi:lbi.09.mdxpractice.mdx.zip |}} | | Pellungrini| |22. | 25.11 11:00-13:00 | project check | |Please do exercize on MDX. Here: {{ :mds:lbi:ex-mdx.pdf |}} you can find other queries that we will solve during the next lectures | | Monreale/Pellungrini | |23. | 01.12 11:00-13:00 | Microstrategy presentation | | | | Monreale/Pellungrini| |24. | 03.12 11:00-13:00 | PowerBI Desktop + Correction Ex. MDX | {{ :mds:lbi:lds.12.powerbi.pdf |}} | {{ :mds:lbi:mdx-corrections.mdx.zip |}}| | Monreale/Pellungrini | |25. | 10.12 11:00-13:00 | Microstrategy Viz| | {{ :mds:lbi:mircostrategy-material.zip |}} If you need the password please check the common chat in Teams, or wirte an email to the teachear.| | Monreale/Pellungrini| ====== Exams ====== **PROJECT ** A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting. The project has to be performed by a team of 2 students (at most 3 after asking authorization for that to the teachers). **Project to be delivered within 31 December 2020 ** * First part of the project consists in the **assignments** described here: {{ :mds:lbi:lds_project_2020.pdf | Project Description Part 1}} * Second part of the project cosist in the **assignments** described here: {{ :mds:lbi:lds_project_2020_part_2.pdf | Project Description Part 2}} * Third part of the project cosist in the **assignments** described here: {{ :mds:lbi:lds_project_2020_part_3.pdf | Project Description Part 3}} * Remember to re-submit all three parts of the project with your third part, as specified in the document above. * **Dataset:** {{ :mds:lbi:project_data.zip |}} * **Deadline**: the fist part has to be delivered within ** November, 18th 2020.** * **Deadline**: the second part has to be delivered within ** December, 4th 2020.** * **Deadline**: the third part has to be delivered within ** December, 31st 2020.** **Project to be delivered during the exam sessions ** Students who did not deliver the above project within 31 December 2020 need to ask by email a new project to the teachers. The project that will be assigned will require about 2 weeks of work and after the delivery it will be discussed during the oral exam. For those students, the oral exams will also cover some practical parts that could not be included in the project. ===== Exam sessions ===== ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ =====Extra sessions A.A. 2019/20===== ^ Date ^ Time ^ Room ^ Notes ^ Results ^ | | | | | | =====Past Editions ===== * [[LDS 2019-2020]] * [[LDS 2018-2019]] * [[LBI 2017-2018]]