t.BA.DS.PM4.20HS (Big Data Project) 
Module: Big Data Project
This information was generated on: 07 October 2024
No.
t.BA.DS.PM4.20HS
Title
Big Data Project
Organised by
T InIT
Credits
4

Description

Version: 3.0 start 01 August 2023
 

Short description

Students gain practical experience of working with Big Data problems. Based on the theoretical foundations of “Data Engineering 1” and “Data Engineering 2”, students analyse selected topics from these foundation courses and implement scalable applications using the latest Big Data technologies.

Module coordinator

Jonathan Fürst (fues)

Learning objectives (competencies)

Objectives Competences Taxonomy levels
Students deepen the methods and tools learned from other courses (e.g., Data Engineering 1 and Data  Engineering 2)
by applying them in a larger course project.
F, M K3, K4
Students learn about the practical applicability of Big Data Systems (e.g., Spark) with their advantages and disadvantages. F, M K3, K4
Students are able to use Python and its data science ecosystem (e.g., pandas, numpy, scikit-learn) and independently apply it in their course project. F, M K3
Students learn to perform experimental evaluation for their created prototypes, including a comparison with a chosen baseline. F, M K3, K6
Students are able to go through a complete project lifecycle in a team, from project proposal to project execution and presentation. M, SO K3
Students are able to go beyond their prior knowledge and decide on appropriate technologies that match the problems found in their course project. SE K3

Module contents

Implementing a typical Big Data project could require the following steps:

• Select a problem to solve, e.g. analyze the popularity of movies over the last ten years and compare the differences between Brazil, France and the USA.
• Select the datasets, e.g. use the content from the internet movie database (IMDB) stored in a relational database. Enrich the information about movies with documents found on the internet.
• Select a baseline system using traditional technology, e.g. use PostgreSQL to analyze the information stored in IMDB or use your favorite information retrieval system to analyze the text documents about the movies.
• Select a state-of-the-art Big Data system to compare against the baselines.
• Implement the application using the baselines system as well as the Big Data system.
• Analyze the performance difference of both systems using small amounts of data.
• Significantly increase the size of the data and study the performance impact.

Students are free to choose any topic of their interest, any dataset or any existing code base. For instance, the students could choose a Python program that runs on a single computer and uses a small dataset with thousands of records. By re-implementing the program using Big Data technology, the students should demonstrate how to build a scalable application that runs on large datasets on tens of computers using large datasets.

By implementing the Big Data project, the students learn about the following aspects:

• Functionality of Big Data systems:
  - What kind of problems can I solve with Big Data systems?
  - Which problems are not suited for Big Data systems?
  - What typical data science algorithms are supported by Big Data systems?
• Performance aspects of Big Data systems:
  - How do I need to re-write my application when the size of the dataset increases by a factor of 10, 100, 1000, etc.?
  - What is the impact on the performance, when the number of users increases by a factor of 10, 100, 1000, etc.?
  - How can I keep the response time constant?
  - What kind of optimization steps are required to implement an enterprise-scale solution?
• Usability of Big Data systems:
  - What is the learning curve of Big Data technology compared to traditional technology?
  - Given a specific use that the students have implemented, does it pay off for a small, medium or large company to invest in Big Data technology?

Teaching materials

Lecture slides and notes of Data Engineering 1 and 2

Supplementary literature

Most recent literature and papers about Big Data

Prerequisites

Data Engineering 1 and 2

Teaching language

(X) German (X) English

Part of International Profile

(X) Yes () No

Module structure

Type 4
  For more details please click on this link: T_CL_Modulauspraegungen_SM2025

Exams

Description Type Form Scope Grade Weighting
Graded assignments during teaching semester Project Programming, presentations and final report     100%
End-of-semester exam None        

Remarks

 

Legal basis

The module description is part of the legal basis in addition to the general academic regulations. It is binding. During the first week of the semester a written and communicated supplement can specify the module description in more detail.

Note

Course: Big Data Project - Praktikum
No.
t.BA.DS.PM4.20HS.P
Title
Big Data Project - Praktikum

Note

  • No module description is available in the system for the cut-off date of 02 August 2099.