Skip to content

Investigate populating Database #137

@ipc103

Description

@ipc103

Right now, fetching our data and building our site is essentially a four-step process. Each step requires the previous one to have been completed in the same flow.

  1. Fetch data from the GitHub API.
  2. Load that data directly into a data.json file.
  3. Build the static site, including the data.json file.
  4. Deploy the site to GitHub pages.

Since the data.json file doesn't get persisted beyond the build, we need to re-fetch data on each pass, and are also limited in the type of data we can store.

An alternative approach would be:

  1. Fetch data from the GitHub API.
  2. Load that data into a database somewhere (CosmosDB, for example).
  3. Before build, load the data from the database into a data.json file.
  4. Build the static site, including the data.json file.
  5. Deploy the site to GitHub pages.

This would have two big advantages:

  1. Deploying the site would no longer be dependent on fetching data from the GitHub API. This should make the overall build/deploy time faster.
  2. Storing in a database gives us much more flexibility on the type of data we store. For example, this could potentially make time-series calculations more feasible.

As a first pass, I'd propose the following course of action:

  • Create an external database store. As a first pass, having a simple key-value store (to mimic our current JSON format) might work best.
  • Write a new script to fetch the existing metrics from the GitHub API and save the results to the new database. For now it's okay if the results overwrite the previous results each time.
  • Add a new workflow to run on both workflow_dispatch and a cron (every day?) that updates the new database.
  • Once we have confidence that the data is being updated, update our deploy workflow to pull data from the database and generate a data.json file instead of running the fetch script directly.

Once that's setup, we can look at adding additional metric stores in a followup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions