Strumenti Utente

Strumenti Sito


bigdataanalytics:bda:start

Big Data Analytics A.A. 2021/22

All lectures will be provided also remotely, through the Teams team named “599AA 21/22 - BIG DATA ANALYTICS [WDS-LM]”

Instructors:

Tutor:

Timetable

  • Wednesday 09:00 - 10:45 Aula Fib M1
  • Friday 09:00 - 10:45 Aula Fib C1

Dataset assignment: datasets have been assigned to teams, find your dataset here https://bit.ly/2YalEtI

Instructions for MidTerm 1: The first mid-term presentation (Data Understanding and Project Proposal) will be on October 20th (half of the teams) and October 22nd (rest of the teams).

  • presentation: prepare a presentation describing the data understanding and a proposal of the problem you aim to solve. Motivate your decisions and choices (e.g., which variables you deleted, how you deal with missing values and noise, the new variables you created, if you integrated your data with external datasets, etc.). The presentation should last max. 20 minutes (+ 10 minutes questions) and must be done running “live” a Colab notebook;
  • code: provide the link to the notebook on Jovian with the code you used for all computations and plots. Document adequately your notebooks using the markdown language. The notebook should be runnable without errors on Google Colab, so put in some blocks instructions to install additional libraries (if any) and instructions on the format the datasets should have in order to run the code correctly.
  • upload the material by Tuesday, October 19th, using the following form: https://forms.gle/BV2Drh9zJKSu1fFC8

Learning goals

In our digital society, every human activity is mediated by information technologies, hence leaving digital traces behind. These massive traces are stored in some, public or private, repository: phone call records, movement trajectories, soccer-logs, and social media records are all examples of “Big Data”, a novel and powerful “social microscope” to understand the complexity of our societies. The analysis of big data sources is a complex task, involving the knowledge of several technological and methodological tools. This course has three objectives:

  • introducing to the emergent field of big data analytics and social mining;
  • introducing to the technological scenario of big data, like programming tools to analyze big data, query NoSQL databases, and perform predictive modeling;
  • guide students to the development of an open-source and reproducible big data analytics project, based on the analysis of real-world datasets.

Module 1: Big Data Analytics and Social Mining

In this module, analytical methods and processes are presented through exemplary cases studies in challenging domains, organized according to the following topics:

  • The Big Data Scenario and the new questions to be answered
  • Sports Analytics:
    1. Soccer data landscape and injury prediction
    2. Analysis and evolution of sports performance
  • Mobility Analytics
    1. Mobility data landscape and mobility data mining methods
    2. Understanding Human Mobility with vehicular sensors (GPS)
    3. Mobility Analytics: Novel Demography with mobile-phone data
  • Social Media Mining
    1. The social media data landscape: Facebook, Linked-in, Twitter, Last_FM
    2. Sentiment analysis. example from human migration studies
    3. Discussion on ethical issues of Big Data Analytics
  • Well-being&Now-casting
    1. Nowcasting influenza with retail market data
    2. Predicting well-being from human mobility patterns
  • Paper presentations by students

Module 2: Big Data Analytics Technologies

This module will provide to the students the technologies to collect, manipulate and process big data. In particular, the following tools will be presented:

  • Python for Data Science
  • The Jupyter Notebook: developing open-source and reproducible data science
  • MongoDB: fast querying and aggregation in NoSQL databases
  • GeoPandas: analyze geo-spatial data with Python
  • Scikit-learn: machine learning in Python
  • Keras: deep learning in Python

Module 3: Laboratory for Interactive Project Development

During the course, teams of students will be guided in the development of a big data analytics project. The projects will be based on real-world datasets covering several thematic areas. Discussions and presentation in class, at different stages of the project execution, will be performed.

  • 1st Mid Term: Data Understanding and Project Formulation
  • 2nd Mid Term: Model(s) construction and evaluation
  • 3rd Mid Term: Model interpretation/explanation
  • Exam: Final Project results

Calendar

15/09 (Mod. 1) Introduction to the course, The Big Data scenario lesson1_introduction_to_the_course_2021.pdf

17/09 (Mod. 2) Python for Data Science and the Jupyter Notebook: developing open-source and reproducible data science

22/09 (Mod. 2) Data Exploration and Understanding practice in Python

24/09 (Mod. 3) Presentation of datasets for the project bda21_22_datasets_1_.pdf

29/09 (Mod. 2) Scikit-learn: programming tools for data mining (part 1) https://jovian.ai/jonpappalord/classification

01/10 (Mod. 2) Scikit-learn: programming tools for data mining (part 2) https://jovian.ai/jonpappalord/clustering

6/10 (Mod. 2) Geopandas and scikit-mobility: managing geographic data in Python (part 1)

8/10 (Mod. 2) Geopandas and scikit-mobility: managing geographic data in Python (part 2)

13/10 (Mod. 1) Case study 1: Injury prediction and how to deal with unbalanced datasets and perform feature selection: bda_2122_injury_forecasting.pdf

15/10 (Mod. 2) Feature selection in Python

20/10 (Mod. 3) MidTerm1

  • BigData-Islanders
  • WeMine
  • cpu_in_flames

22/10 (Mod. 3) MidTerm1

  • How I Met Your Big Data
  • SLM
  • The Missing Values

Exam (Appelli)

TDA

Previous Big Data Analytics websites

bigdataanalytics/bda/start.txt · Ultima modifica: 15/10/2021 alle 15:14 (9 ore fa) da Luca Pappalardo