t.BA.DS.PM4.20HS (Big Data Project) 
Module: Big Data Project
This information was generated on: 26 September 2021
No.
t.BA.DS.PM4.20HS
Title
Big Data Project
Organised by
T InIT
Credits
4

Description

Version: 2.0 start 01 February 2021
 

Short description

Students gain practical experience of working with Big Data problems. Based on the theoretical foundations of “Data Engineering 1” and “Data Engineering 2”, students analyse selected topics from these foundation courses and implement scalable applications using the latest Big Data technologies.

Module coordinator

Kurt Stockinger (stog)

Learning objectives (competencies)

Objectives Competences Taxonomy levels
Being able to execute Big Data projects Programming and evaluation K4, K6
     
     
     

Module contents

Implementing a typical Big Data project could require the following steps:

• Select a problem to solve, e.g. analyze the popularity of movies over the last ten years and compare the differences between Brazil, France and the USA.
• Select the datasets, e.g. use the content from the internet movie database (IMDB) stored in a relational database. Enrich the information about movies with documents found on the internet.
• Select a baseline system using traditional technology, e.g. use PostgreSQL to analyze the information stored in IMDB or use your favorite information retrieval system to analyze the text documents about the movies.
• Select a state-of-the-art Big Data system to compare against the baselines.
• Implement the application using the baselines system as well as the Big Data system.
• Analyze the performance difference of both systems using small amounts of data.
• Significantly increase the size of the data and study the performance impact.

Students are free to choose any topic of their interest, any dataset or any existing code base. For instance, the students could choose a Python program that runs a single computer and uses a small dataset with thousands of records. By re-implementing the program using Big Data technology, the students should demonstrate how to build a scalable application that runs on large datasets on tens of computers using large datasets.

By implementing the Big Data project, the students learn about the following aspects:

• Functionality of Big Data systems:
  - What kind of problems can I solve with Big Data systems?
  - Which problems are not suited for Big Data systems?
  - What typical data science algorithms are supported by Big Data systems?
• Performance aspects of Big Data systems:
  - How do I need to re-write my application when the size of the dataset increases by a factor of 10, 100, 1000, etc.?
  - What is the impact on the performance, when the number of users increases by a factor of 10, 100, 1000, etc.?
  - How can I keep the response time constant?
  - What kind of optimization steps are required to implement an enterprise-scale solution?
• Usability of Big Data systems:
  - What is the learning curve of Big Data technology compared to traditional technology?
  - Given a specific use that the students have implemented, does it pay off for a small, medium or large company to invest in Big Data technology?

Teaching materials

Lecture slides and notes of Data Engineering 1 and 2

Supplementary literature

Most recent literature and papers about Big Data

Prerequisites

Data Engineering 1 and 2

Teaching language

(X) German (X) English

Part of International Profile

(X) Yes () No

Module structure

Type 4
  For more details please click on this link: T_CL_Modulauspraegungen_SM2025

Exams

Description Type Form Scope Grade Weighting
Graded assignments during teaching semester Project Programming and final report     100%
End-of-semester exam          

Remarks

 

Legal basis

The module description is part of the legal basis in addition to the general academic regulations. It is binding. During the first week of the semester a written and communicated supplement can specify the module description in more detail.
Course: Big Data Project - Praktikum
No.
t.BA.DS.PM4.20HS.P
Title
Big Data Project - Praktikum

Note

  • No module description is available in the system for the cut-off date of 02 August 2099.