Stock Data ETL Pipeline

This is an ETL pipeline implemented using GCP, Prefect, PySpark and Python, it contains

User defined functions.
extraction of data from an API.
transformation of the data to desired format.
loading the data to GCS.
loading the data to Google Bigquery.
using the data to generate a looker report

Prerequisite

A GCP account, pretty easy to make and a little bit knowlege of cloud, linux, sql and python.

Libraries required

All python libraries required are present in the requirements.txt, to install just run

pip install -r requirements.txt

About the project

The project has 5 python files

main.py
just a placeholder for extract.py and transform.py to execute them one by one.
extract.py
this does the extraction of the date from API for the desired stocks.
transform.py
this file uses PySpark to run some transformation on the data to make it into the desirable format.
load.py
this file loads the data to Google Cloud Storage using prefect.
load_to_bq.py
this file loads the data to Google BigQuery using prefect.

How to run the project

To run the project create a VM insatnce, generate the SSH keys for it GCP SSH keys, update the SSH key in the metadata section of the VM

then spin up a VM, create a Service account that will be used later to load data to GCS and BQ provide the service account with approprite access, Storage Admin, Storage Object Admin and Big Query Admin should be sufficient. Generate keys for the service account(this wil be used later in prefect blocks).

create a storage bucket. GCP Storage Bucket.
Also Create a Bigquery Dataset and Table for table provide the schema(schema_for_bq.json) thats in the files. GCP BigQuery Datasets.
After cloud setup, setup a prefect cloud account prefect Cloud, or you can run the prefect server locally.
Login to your prefect cloud, go to blocks, search GCP Credentials and create a prefect block. just add the required details
create a GCS Bucket block using the same steps above and the bucket we created using while setting up GCP cloud.
Then login to your prefect Cloud account using Console using the command

$ prefect cloud login

It will ask for API key, you can get that generated from your profile section on Prefect cloud.

Then run the main.py file using

python main.py

or if you want to run specific files, you can also do that, just replace the main.py file name with the script you want to run.

After the data is succesully in the BQ, you can test that out by querying the table using BQ query tool.
then click on Explore Data Option

That should route you to looker, create an account and then you can crate a report using the Data in your BQ table. Looker Report.

Some Example Screenshots.

Tips

You can also create it like a cron job that runs on a specific day of every month using either prefect deployment or putting it into cloud run function.

GCP Cloud Function Job

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
__pycache__		__pycache__
images		images
.prefectignore		.prefectignore
Untitled.ipynb		Untitled.ipynb
Untitled.py		Untitled.py
Untitled1.ipynb		Untitled1.ipynb
Untitled2.ipynb		Untitled2.ipynb
etl_gcs_to_bq-deployment.yaml		etl_gcs_to_bq-deployment.yaml
extract.py		extract.py
load-deployment.yaml		load-deployment.yaml
load.py		load.py
load_to_bq.py		load_to_bq.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
schema_for_bq.json		schema_for_bq.json
transform.py		transform.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Data ETL Pipeline

Prerequisite

Libraries required

About the project

How to run the project

Tips

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stock Data ETL Pipeline

Prerequisite

Libraries required

About the project

How to run the project

Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages