Realtime-Streaming-Data

Realtime Streaming Data Process using Apache Airflow, Kafka and Spark.

Prerequisites

Have your Docker installed on your computer, then check: docker -version and docker compose -version.
In your terminal, run docker compose up -d.
Trigger the user_automation task on Airflow Web UI.
Check if the Kafka topic has been created on Control Center.
In the terminal, run the command: spark-submit --master spark://localhost:7077 spark_stream.py to submit spark job.
Run the Cassandra cluster and do some stuff of SQL query (SELECT FROM).

System Architecture

How can I make this project more practical and better? :)

A lot can still be done:

Cloud composer for Airflow, Kafka and Spark (such as AWS Managed Service for Airflow, Kafka, EMR for Spark).
Data quality tests.
OLAP Operations with Data Warehouse for analytical purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dags		dags
script		script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt
spark_stream.py		spark_stream.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realtime-Streaming-Data

Realtime Streaming Data Process using Apache Airflow, Kafka and Spark.

Top contents:

Prerequisites

System Architecture

How can I make this project more practical and better? :)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

dain55788/Realtime-Streaming-Data

Folders and files

Latest commit

History

Repository files navigation

Realtime-Streaming-Data

Realtime Streaming Data Process using Apache Airflow, Kafka and Spark.

Top contents:

Prerequisites

System Architecture

How can I make this project more practical and better? :)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages