-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Insert project intro here
-
Project Setup
-
Create GitHub Repo
-
Clone repo in the local directory
Add wiki page here with the ff details:
Command line scripts to clone repo from github website -
Create and activate a virtual environment in the project repo
Add wiki page here with the ff details:
Command line scripts to create a virtual environment
Command line script to add a PYTHONPATH in the virtual environment to easily manage and import user-defined modules- Add this at the end of the Script/activate file:
export PYTHONPATH="C:/Users/GRACE ESTRADA/OneDrive/Desktop/CarValuePro/CarValueProRepo/src"
- Add this at the end of the Script/activate file:
-
Setup PostgreSQL Database in Render
Add wiki page here with the ff details:
Step by step guide to create a PostgreSQL Database instance in Render
Step by step guide to connect the deployed database instance to a local installation of pgAdmin4
-
-
Create the ETL Pipeline
- Extraction: Web Scraping using Selenium
Add wiki page here with the ff details:
Data extracted
Entry and exit point for full and incremental extraction - Transformation: Transform and cleanse the extracted data in an analysis-ready format
Add wiki page here with the ff details:
Data cleansing and transformation applied - Load: Load the data in the staging (after extraction) table and production (after transformation) table
Add wiki page here with the ff details:
Loading logic for full and incremental processing - Orchestrator
Add wiki page here with the ff details:
Logic for Python Scheduler
- Extraction: Web Scraping using Selenium
-
Dockerize the Pipeline
- Create the DockerFile
- Build a Docker image from the DockerFile
- Run a container from the built image to execute the pipeline
- Push the image to DockerHub
-
Deploy the Pipeline
Add wiki page here with a step-by-step guide to deployment
- Pipeline is deployed as a
Background Workerin Render - It automatically runs the pipeline at a scheduled date specified directly on the code
- To run the full pipeline, go to SSH and run
python scripts/run_full_pipeline.py
- Pipeline is deployed as a