Skip to content
L. Grace Estrada edited this page Oct 17, 2024 · 10 revisions

Welcome to the CarValuePro Wiki Page!

Please refer to the outline below and the corresponding links for more details:

  1. Project Setup

    • Create GitHub Repo
    • Clone repo in the local directory

      Add wiki page here with the ff details:
      Command line scripts to clone repo from github website

    • Create and activate a virtual environment in the project repo

      Add wiki page here with the ff details:
      Command line scripts to create a virtual environment
      Command line script to add a PYTHONPATH in the virtual environment to easily manage and import user-defined modules

    • Setup PostgreSQL Database in Render

      Add wiki page here with the ff details:
      Step by step guide to create a PostgreSQL Database instance in Render
      Step by step guide to connect the deployed database instance to a local installation of pgAdmin4

  2. Create the ETL Pipeline

    • Extraction: Web Scraping using Selenium

      Add wiki page here with the ff details:
      Data extracted
      Entry and exit point for full and incremental extraction

    • Transformation: Transform and cleanse the extracted data in an analysis-ready format

      Add wiki page here with the ff details:
      Data cleansing and transformation applied

    • Load: Load the data in the staging (after extraction) table and production (after transformation) table

      Add wiki page here with the ff details:
      Loading logic for full and incremental processing

    • Orchestrator

      Add wiki page here with the ff details:
      Logic for Python Scheduler

  3. Dockerize the Pipeline

    • Create the DockerFile
    • Build a Docker image from the DockerFile
    • Run a container from the built image to execute the pipeline
    • Push the image to DockerHub
  4. Deploy the Pipeline

    Add wiki page here with a step-by-step guide to deployment

Clone this wiki locally