Yelp Database Benchmarking

This repository contains the source code and setup instructions for our final project in UC Berkeley’s Data Engineering course.

Note: Due to course policies, the full dataset and final results (report) cannot be made publicly accessible. If you are a recruiter or collaborator interested in reviewing the complete findings, please reach out to me directly:

Email: nawodakw@berkeley.edu
LinkedIn: linkedin.com/in/nawodakw

Project Overview

We benchmarked two data systems—PostgreSQL and PySpark—on the Yelp Open Dataset. Our goal was to compare performance, storage costs, and usability across a set of representative queries. Key tasks included:

Schema Design: Created a relational schema in PostgreSQL and a corresponding plan for PySpark.
Data Loading: Ingested multi-gigabyte Yelp data locally.
Benchmark Queries: Tested query performance (time and memory), user ergonomics, and data-modeling flexibility.
Analysis & Report: Summarized findings to guide system selection for various business use cases.

Repository Structure

postgresql/: Scripts and instructions for setting up the PostgreSQL database, schemas, and sample queries.
pyspark/: PySpark notebooks and scripts to illustrate our benchmarking approach on the same Yelp data.
docs/: Additional documentation, diagrams, and notes on data modeling.
README.md: This overview document.

Confidential Report

Our full write-up, including detailed performance metrics and analysis, is private. Please contact me if you would like to access it.

Contact

Email: nawodakw@berkeley.edu
LinkedIn: linkedin.com/in/nawodakw

Thank you for your interest!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yelp Database Benchmarking

Project Overview

Repository Structure

Confidential Report

Contact

About

Uh oh!

Releases

Packages

nawoda2/Yelp-Database-Benchmarking

Folders and files

Latest commit

History

Repository files navigation

Yelp Database Benchmarking

Project Overview

Repository Structure

Confidential Report

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages