Project overview

Stripe: Using Stripe's API as the source of generating realistic transaction data.
Apache Kafka: Stripe transaction data is streamed into Apache Kafka. It handles the data streams and ensures they are processed in real-time.
Apache ZooKeeper: ZooKeeper is used alongside Kafka to manage and coordinate the Kafka brokers. ZooKeeper helps maintain configuration information, naming, synchronization, and group services.
PySpark: The data from Kafka is then processed using PySpark Structured Streaming. This involves transforming individual transaction data into rows fit for a database.
Forex API: GBP/x exchange rates are taken from an online API and updated every 24 hours.
PostgreSQL: After processing, the data is stored in PostgreSQL.
dbt (Data Build Tool): dbt is used to manage and transform data within PostgreSQL. Data is split into dimension and a fact tables.
Grafana: Finally, the data stored in PostgreSQL is visualized using Grafana.

Data model

dbt documentation

Link to the docs

dbt lineage

Visualisation

Considerations for improvements

add PySpark tests
use an orchestrator
use more data
- Spark might be an overkill due to the data amount limitations, but I wanted to learn how to set Spark Streaming up in case data is much more
- for a better Grafana visualisation
- maybe find an alternative transactions data source because the Stripe API has a 25 rate limit
- also many of the generated values in a transaction from the Stripe API are null

Setup

git clone https://github.com/divakaivan/transaction-stream-data-pipeline.git
Rename sample.env to .env and fill in the necessary environment variables
Type make in the terminal to see the setup options

Usage: make [option]

Options:
  help                 Show this help message
  build                Build docker services
  start                Start docker services (detached mode)
  stop                 Stop docker services
  dbt-test             Run dbt with LIMIT 100
  dbt-full             Run dbt with full data

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data_modelling		data_modelling
forex-rates		forex-rates
grafana/dashboards		grafana/dashboards
kafka-consumer		kafka-consumer
kafka-producer		kafka-producer
postgres		postgres
project-png		project-png
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
sample.env		sample.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project overview

Data model

dbt documentation

dbt lineage

Visualisation

Considerations for improvements

Setup

About

Uh oh!

Uh oh!

Languages

divakaivan/transaction-stream-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Project overview

Data model

dbt documentation

dbt lineage

Visualisation

Considerations for improvements

Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages