Skip to content

StavrosLzp/NTUA-Advanced-DB-Project-2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This is the code repository for the "Advanced Topics in Database Systems" Semester Project

This semester project focuses on analyzing large datasets using data science processing techniques with Apache Hadoop (version >= 3.0) and Apache Spark (version >= 3.5).

Project Goals

  • Familiarization with installing and managing distributed Apache Spark and Apache Hadoop systems.
  • Apply modern techniques through Spark APIs for big data analysis.
  • Understand the capabilities and limitations of these tools in relation to available resources and configurations.

Resources

The Project was hosted on a specially configured environment in the AWS cloud. The Code was developed and tested on Amazon's SageMaker AI Notebooks using S3 buckets for storage.

Additional Notes

The assignement Presentation can be seen here: project_eng_2024.pdf

The final report can be seen here: Report.pdf

For more details and code examples, please refer to the Jupyter Notebook: advanced_db_project_2024.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors