Skip to content
L. Grace Estrada edited this page Mar 7, 2025 · 10 revisions

Welcome to the CarValuePro Wiki Page!

Insert project intro here

Part 1: Data Pipeline

  1. Project Setup

    • Create GitHub Repo

    • Clone repo in the local directory

      Add wiki page here with the ff details:
      Command line scripts to clone repo from github website

    • Create and activate a virtual environment in the project repo

      Add wiki page here with the ff details:
      Command line scripts to create a virtual environment
      Command line script to add a PYTHONPATH in the virtual environment to easily manage and import user-defined modules

      • Add this at the end of the Script/activate file: export PYTHONPATH="C:/Users/GRACE ESTRADA/OneDrive/Desktop/CarValuePro/CarValueProRepo/src"
    • Setup PostgreSQL Database in Render

      Add wiki page here with the ff details:
      Step by step guide to create a PostgreSQL Database instance in Render
      Step by step guide to connect the deployed database instance to a local installation of pgAdmin4

  2. Create the ETL Pipeline

    • Extraction: Web Scraping using Selenium

      Add wiki page here with the ff details:
      Data extracted
      Entry and exit point for full and incremental extraction

    • Transformation: Transform and cleanse the extracted data in an analysis-ready format

      Add wiki page here with the ff details:
      Data cleansing and transformation applied

    • Load: Load the data in the staging (after extraction) table and production (after transformation) table

      Add wiki page here with the ff details:
      Loading logic for full and incremental processing

    • Orchestrator

      Add wiki page here with the ff details:
      Logic for Python Scheduler

  3. Dockerize the Pipeline

    • Create the DockerFile
    • Build a Docker image from the DockerFile
    • Run a container from the built image to execute the pipeline
    • Push the image to DockerHub
  4. Deploy the Pipeline

    Add wiki page here with a step-by-step guide to deployment

    • Pipeline is deployed as a Background Worker in Render
    • It automatically runs the pipeline at a scheduled date specified directly on the code
    • To run the full pipeline, go to SSH and run python scripts/run_full_pipeline.py

Part 2: Model Development

Clone this wiki locally