====== Decision Support Systems - Module II (6 ECTS): LABORATORY OF DATA SCIENCE (2022/2023) ====== This is the second module of [[mds:dss:start|Decision Support Systems]] (801AA, 12 ECTS), previously called Laboratory of Data Science ((664AA, 6 ECTS). **Instructors**: * **Anna Monreale** * KDD Laboratory, Università di Pisa * [[http://pages.di.unipi.it/amonreale/]] * [[anna.monreale@unipi.it]] * Office hours: Tuesday: 11:00-13:00 by online using Teams or at the Department of Computer Science, room 374/E (Please ask an appointment by email). * Telephone +39-050-2213119 * **Roberto Pellungrini** * KDD Laboratory, Università di Pisa * [[roberto.pellungrini@di.unipi.it]] * Office hours: Thursday 14:00-16:00, Online using Teams (Appointment by email). * Telephone +39-050-2212728 ====== News ===== * [08.11.2022] Instructions for the SSAS project in the Lecture of today: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as _foodmart (2) from project properties select 'Deployment', then rename the database as _foodmart; (3) click on the button "show all files" just above "Solution explorer" right click on "view code" on the .database file that is visualized, and then change the ID from current name into _foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you may[[ http://technet.microsoft.com/en-us/library/ms175630.aspx#bkmk_newusingwizard|import the project]] from the SSAS server and rename it as _foodmart (step 4 is still necessary). * [09.09.2022] The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students. ====== Hours and Rooms ====== **Classes ** ^ Day of Week ^ Hour ^ Room ^ | Tuesday | 14:00 - 16:00 | Room Lab. H | | Thursday | 09:00 - 11:00 | Room Lab. M | A **[[https://teams.microsoft.com/l/channel/19%3a1ade445a235343fcb93aa6a19a174a5c%40thread.tacv2/Module%2520II%2520-%2520Laboratory%2520of%2520Data%2520Science?groupId=279299f3-aa07-48b1-8ec6-b2fd1a6d125d&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams Channel]]** will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students. ====== Learning Material ====== ===== Slides & Recordings of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Recordings of each lecture will be made available for non-attending students. ===== Past Exams ===== * {{ :mds:lbi:2016midterm1text.pdf |2016/17 text}}, {{ :mds:lbi:2015fallmidterm1text.pdf | 2015/16 text}} and {{ :mds:lbi:2015wintermidterm1.zip | 2015/16 solution}}, {{:mds:lbi:2015midterm1text.pdf | 2014/15 text}} and {{ :mds:lbi:2015midterm1.zip |2014/2015 solution}}, {{ :mds:lbi:2014midterm1text.pdf | 2013/14 text}},{{ :mds:lbi:2013midterm1.pdf | 2012/13 text }} and {{ :mds:lbi:2013midterm1.zip |2012/13 solution}}. ===== Software===== * Anaconda with Python 3.7 (Please, avoid Python 3.8) * SQL Server 2019 Developer Edition:[[https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-ver16|SQL Server 2019 Management Studio]]. * Data Tools for Visual Studio 2019: instructions here Italian: [[https://docs.microsoft.com/it-it/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-ver15#ssdt-for-visual-studio-2019|Data Tools Visual Studio 2019 IT]] English: [[https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-ver15#ssdt-for-visual-studio-2019|Data Tools Visual Studio 2019 EN]] * Microsoft Excel * [[https://powerbi.microsoft.com/it-it/desktop/| Power BI Desktop]] ===== F.A.Q. ===== * [[http://www.sid.unipi.it/polo2/2015/03/26/connessione-alle-reti-wifi/ | Connection to wi-fi]] * [[http://www.sid.unipi.it/polo2/studenti/ | F.A.Q.s about the labs]] * [[https://start.unipi.it/help-ict/vpn/ | Unipi VPN ]] * [[https://autenticazione.unipi.it/auth/auth.signin | Unipi Authentication]] to access the VPN, make sure that network access services are enabled on you profile. Follow this link to access your Unipi profile. ====== Class calendar - (2022-2023) ====== ^ ^ Day ^ Topic ^ Slides ^ Data/Software ^ References ^ Video Lectures ^ Teacher| |1. |15.09 09:00-11:00| Introduction to the Course. BI Architecture. File data access. | {{ :mds:lbi:2022-lds.01.introduction.pdf | Course Introduction}} {{ :mds:lbi:2022-lds.02.bi_architectures.pdf | BI Archit.}} {{ :mds:lbi:2022-lds.03.file_data_access.pdf | File Data Access}}| |-** BI technology:** [[https://cacm.acm.org/magazines/2011/8/114953-an-overview-of-business-intelligence-technology/fulltext | An Overview of Business Intelligence Technology]] - **File access:** {{ :mds:lbi:filesystem.pdf | File System Interface}} | [[https://unipiit.sharepoint.com/:v:/s/a__td_57058/ETP3AXvvKmNAs6wO2gdSwBQBnnIdm9e7CIiW1ao6sv68xA?e=XmsAGP|Video 1: BI Architecture]] [[https://unipiit.sharepoint.com/:v:/s/a__td_57058/EQGOCKbyor5Jv6BtfFbRX6ABA9Mit-4ARPhB_M6cD-SCZg?e=WAC02j|Video 2: File Data Access - Part 1]] | Monreale | |2. | 20.09 14:00-16:00| Representation formats: CSV, FLV, ARFF, XML. Python Recap.| {{ :mds:lbi:2020-lds.04.python.pptx.pdf | Python Recap}} | | - **File Formats:** [[http://www.stat.auckland.ac.nz/~paul/ItDT | Introduction to data technologies(Chps. 5, 6)]], [[http://weka.wikispaces.com/ARFF+(stable+version)|Weka ARFF Format]], [[http://weka.wikispaces.com/XRFF|XRFF Format]] - **Python reference:** [[https://www.spronck.net/pythonbook/ | Free python book with exercises]] | [[https://unipiit.sharepoint.com/:v:/s/a__td_57058/EVudIciqB_JCpbeL0bxLbjIBMzWvDJllrSYQUqe2QdeCSA?e=4QgNVC|Video 1: File Data Access and Python Recap]] [[https://unipiit.sharepoint.com/:v:/s/a__td_57058/EaMXyY3JHGNIg7vX4FW9hkQBhOvezPLTdmp3NyO87m46wg?e=ba9YdF|Video 2: Python Recap]]| Monreale, Pellungrini | |3. | 22.09 09:00-11:00 | File Access in Python | {{ :mds:lbi:lds.05.fileaccess-python2021.pdf |}} | {{ :mds:lbi:data1.zip |}} {{ :mds:lbi:python_solutions.zip |}}| |[[https://unipiit.sharepoint.com/:v:/s/a__td_57058/EbciASaZjItFqNtjSCpAohcBBeqLaMbSWE3iZ0XP9RM_UA?e=YB8xfj|Video: Python Lab: Exercises & File Access]] |Pellungrini | |4. | 26.09 14:00-16:00 | File Access in Python, lab practice | {{ :mds:lbi:lds.05.fileaccess-python2021.pdf |}} | {{ :mds:lbi:data1.zip |}}{{ :mds:lbi:solutions26092022.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EeFtvS3FifpHpMBcLqTR838BLcXdy53knxEQS2aNYo5v3w?e=qpZSix|Video Lecture]]|Pellungrini | |5. | 29.09 9:00-11:00 | Python Exercises | {{ :mds:lbi:ex-customers.pdf |}} | {{ :mds:lbi:data-customers.zip |}} {{ :mds:lbi:ex-customers_solution.zip |}} {{ :mds:lbi:lds.file.format.zip |}}| | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EY2gMgaN5phBq-CFpyJIQ9UBrdHTrFY5FR7H7s3RDMU1Tg?e=owdZAX|Video Lecture]] |Pellungrini| |6. | 04.10 14:00-16:00 |RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | {{ :mds:lbi:lds.06.relational_data_access-2021.pdf |}} | | | [[https://unipiit.sharepoint.com/:v:/s/a__td_57058/ETwH7giy9a9Mi7Xtfu9OuX0BBxaBF-esaYndY_Ah2ZdXXA?e=kD1zTn|Video 1]]; [[https://unipiit.sharepoint.com/:u:/s/a__td_57058/EZqPbMAVuXdFn6_1GWVG8dABalyJM5kEPNAoSD2dzz7Ogw?e=rFcn6n|Video 2]] | Monreale| |7. | 06.10 09:00-11:00 |RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | {{ :mds:lbi:lds.06.relational_data_access-2021.pdf |}} | {{ :mds:lbi:2021-code-db-samples.zip |}}| | | Monreale| |8. | 11.10 14:00-16:00 | Stratified Sampling, SQL server management demo | {{ :mds:lbi:lds.07.sqlserver.pdf |}} | {{ :mds:lbi:stratifiedsampling.zip |}}| |[[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EYsfdMxeMdJLq7aKa6ChyJQBP1X6-HWRKgI8eNyj7ECycw?e=3f8fI5|Video Lecture]] | Pellungrini| |9. | 13.10 09:00-11:00 |ETL tools: SQL Server Integration Services (SSIS). | {{ :mds:lbi:lds.08.etlandssis.pdf |}}| | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/ETXouaErlVtPunby2a03fOMBoZFdxbzdNdTgqsE1G1dAWQ?e=DRDruP|Video1]][[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EYp2NAEPOPpEhLrwLXXGMWEBmw11n5jHVxk3W_cq9hKkRw |Video 2]] [[https://unipiit.sharepoint.com/:v:/s/a__td_524292/Eag-ig3Gsr9IltX0UiIx2OMBzzt8Jls0IBdVrNRxjPQnHw?e=jHZYf4|Video 1 - 2021]]| Monreale| |10. | 18.10 14:00-16:00 |ETL tools: explanation + practice | {{ :mds:lbi:lds.08.etlandssis.pdf |}}| {{ :mds:lbi:2021-lds-etl-project.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/ETqRob0alitAv7t0HbEXPvoB7pqUhKWTPtPySF0y3cTyOA?e=kXWdPQ|Video]]| Pellungrini| |11. | 20.10 09:00-11:00 |SSIS exercises: Stratified Subsampling | {{ :mds:lbi:ex-midterm.pdf |}} | | | [[https://unipiit.sharepoint.com/:v:/s/a__td_524292/EStEjTSwWmpPs8OUhgn87eoBYqfq18tJURoIazSn1XBsIA?e=J9os3H|Video of the last year]]| Monreale| |12. | 25.10 14:00-16:00 |SSIS exercises: Dissimilarity Index + project support | {{ :mds:lbi:ex-midterm.pdf |}} | {{ :mds:lbi:exercises25102022.zip |}} | | [[ https://unipiit.sharepoint.com/:v:/s/Registrazioni628/ETsCALQT3RpFroh_ejSiDlABtV8erMLv1nci_Bv4QmCDiw?e=8jzYli |Video]]| Pellungrini| |13. | 27.10 09:00-11:00 |Slowly Changing Dimensions + project support | | {{ :mds:lbi:lbiexamplescomplete.zip |}} | | [[ https://unipiit.sharepoint.com/:v:/s/Registrazioni628/ERDloA16n-lOtev3zK9nNRwBxnGwpoy2qg-8GVhU-nP2sw?e=PDKeay |Video]]| Pellungrini| |14. | 03.11 09:00-11:00 |SSIS: Surrogate keys| | {{ :mds:lbi:lbiexamplescomplete.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/Ebld1Nylwo1Jtfwlovu0nSoBsECotn3m6cpwdJrkS61r5A?e=CeaE6Q |Video1]]| Monreale| |15. | 08.11 14:00-16:00 |OLAP with SQL Server Analysis Services (SSAS): data source views, dimensions, hierarchies. |{{ :mds:lbi:lds.09.ssas-21.pdf |}} | {{ :mds:lbi:foodmart_monreale_full_2022_nov08.zip |}}|**1) SSAS (olap):** [[http://msdn.microsoft.com/en-us/library/bb522607.aspx|documentation]]; 2) S. Harinath et al. {{ :mds:lbi:ssas2012ch456.pdf |Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chps. 4-6}} | [[https://unipiit.sharepoint.com/:v:/s/a__td_57058/EUX2ObQqufBDrkx7yFcHqDEB4Zr_LUdRREIN6vxuAYpDFA?e=ykDFgl|Video]] We published the recording of the last year because the video has some audio issue.| Pellungrini| |16. | 10.11 09:00-11:00 |OLAP with SQL Server Analysis Services (SSAS): Data cubes, Parent-child hierarchies. | same slides of the last lecture | | This version of the project contains the dependecies {{ :mds:lbi:10nov-foodmart_monreale_full.zip |}} | [[https://unipiit.sharepoint.com/:v:/s/a__td_57058/ESA7pAfGhdtPnwgoD4i9p4QB3D-PRfOfDS7sTKurHKYQ_Q?e=vhTpYx|Video]]| Monreale| |17. | 15.11 14:00-16:00 |OLAP Cube, Measure setup, Calculated Members, Excel power pivot integration.| same slides of the last lecture |{{ :mds:lbi:foodmartexplorative.xlsx |}} | | [[ https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EUsRIkohIKdMqt4jbGTdxtwBbVabSQPbW9tpjHmBruSyCQ?e=y1UehI |Video 1]]; [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EesjYirvg2JKsUD1pAYHBxwBSqQCI1LGJXDtBxxLi5Dc6w?e=4W1uOr|Video 2]] | Monreale| |18. | 17.11 9:00-11:00 | Visual Studio advanced Features and MDX first examples| same slides of the last lecture | {{ :mds:lbi:foodmart_monreale_full.zip |}} | **MDX:** 1) [[http://msdn.microsoft.com/en-us/library/bb500184.aspx|documentation]] and a [[https://www.mssqltips.com/sqlservertip/3129/order-and-sort-with-mdx-in-sql-server-analysis-services/|useful guide on ordering]]; 2) S. Harinath ed al. {{ :mds:lbi:ssas2012ch3.pdf |Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chp. 3.}} | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EQ3lRlKIyatJuMi2tlOH6LUBbrk8uslzqZylLFFr6QzjCQ?e=mlUdsa|Video ]]| Pellungrini| |19. | 22.11 14:00-16:00 | MDX Practice| same slides of the last lecture | | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EVWsnC85_edHorLhH9Gz9DsBcXF1nvxaexdGPmV2SqHYTg?e=MgoBeB|Video ]]| Monreale| |20. | 24.11 09:00-11:00 | MDX Practice| same slides of the last lecture | {{ :mds:lbi:mdx-practice.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EVWsnC85_edHorLhH9Gz9DsBcXF1nvxaexdGPmV2SqHYTg?e=MgoBeB|Video ]]| Monreale| |21. | 29.11 14:00-16:00 | MDX Practice| same slides of the last lecture | {{ :mds:lbi:practice2.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EfJYhhK_DsRBuCnhbmR5ZiMBDNmlZ_UehyXWuk8wFJRsnQ?e=eLYHPJ|Video ]]| Monreale| |22. | 01.12 09:00-11:00 | Practice on MDX + PowerBI| {{ :mds:lbi:lds.12.powerbi.pdf |}} | {{ :mds:lbi:mdxqueryies.zip |}}| | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/Ec0UO5Oy7gVEs0eHCaALkTEBwZReeBuejKEHJDyydR-pWw?e=3JMrCh|Video ]]| Monreale| |23. | 13.12 14:00-16:00 | Lecture by Noovle SPA, TIM Group (lecture only in Italian)| | | | [[|Video ]]| Monreale| |24. | 15.12 09:00-11:00 | Lecture by Noovle SPA, TIM Group (lecture only in Italian)| | | | [[|Video ]]| Monreale| ====== Exams ====== //There are no mid-terms//. The exam of Decision Support Systems (801AA, 12 ECTS) consists of a written part and an oral part on the topics of the first module (50% of the final grade), and a lab project with discussion on the topics of the second module (50% of the final grade). For the rules of the first module visit the [[http://didawiki.di.unipi.it/doku.php/mds/dsd/start|Module I: Decision Support Databases]]. For details on the Lab project read with attention the next section. **PROJECT ** A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting. The project has to be performed by a team of 2 students (at most 3 after asking authorization for that to the teachers). Each part of the project **must be documented** with a brief pdf report (no more that 2/3 pages) describing your solution. **Project to be delivered within 31 December 2022 ** * First part of the project consists in the **assignments** described here:{{ :mds:lbi:lds_project_2022_part_1.pdf |}} * Second part of the project consist in the **assignments** described here:{{ :mds:lbi:lds_project_2022_part_2.pdf |}} * Third part of the project consist in the **assignments** described here: {{ :mds:lbi:lds_project_2022_part_3.pdf |}} * Remember to re-submit all three parts of the project with your third part, as specified in the document above. * **Dataset:** {{ :mds:lbi:answerdatasetnew.zip |}} * **Deadline**: First deadline - 9 Nov 2022 * **Deadline**: Second deadline - 10 Dec 2022 * **Deadline**: Third deadline - 31 Dec 2022 **Project to be delivered during the exam sessions ** Students who did not deliver the above project within 31 December 2022 need to ask by email a new project to the teachers. The project that will be assigned will require about 2 weeks of work and after the delivery it will be discussed during the oral exam. For those students, the oral exams will also cover some practical parts that could not be included in the project. ** Please write to both teachers!** ===== Exam sessions ===== ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ =====Past Editions ===== * [[LDS 2021-2022]] * [[LDS 2020-2021]] * [[LDS 2019-2020]] * [[LDS 2018-2019]] * [[LBI 2017-2018]]