This is the code repository for the "Advanced Topics in Database Systems" Semester Project

This semester project focuses on analyzing large datasets using data science processing techniques with Apache Hadoop (version >= 3.0) and Apache Spark (version >= 3.5).

Project Goals

Familiarization with installing and managing distributed Apache Spark and Apache Hadoop systems.
Apply modern techniques through Spark APIs for big data analysis.
Understand the capabilities and limitations of these tools in relation to available resources and configurations.

Resources

The Project was hosted on a specially configured environment in the AWS cloud. The Code was developed and tested on Amazon's SageMaker AI Notebooks using S3 buckets for storage.

Additional Notes

The assignement Presentation can be seen here: project_eng_2024.pdf

The final report can be seen here: Report.pdf

For more details and code examples, please refer to the Jupyter Notebook: advanced_db_project_2024.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Report.pdf		Report.pdf
advanced_db_project_2024.ipynb		advanced_db_project_2024.ipynb
project_eng_2024.pdf		project_eng_2024.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This is the code repository for the "Advanced Topics in Database Systems" Semester Project

Project Goals

Resources

Additional Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

StavrosLzp/NTUA-Advanced-DB-Project-2024

Folders and files

Latest commit

History

Repository files navigation

This is the code repository for the "Advanced Topics in Database Systems" Semester Project

Project Goals

Resources

Additional Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages