Skip to content

Divide analysis_data table, create new commits table #34

@brianwarner

Description

@brianwarner

Per discussion with @sgoggins, this issue proposes some major structural changes to the way commit data is stored. It was triggered by discussion around #33

For a while I have wanted to optimize analysis_data. Each row in the table contains info on each file that changed in a commit. Each row also contains its own copy of author and committer info. When a commit changes a single file, it's not really a big deal. But when a commit changes a lot of files, there's a lot of duplication in the metadata.

There is some benefit in breaking this info out into a separate table, called commits. It would reduce the overall size of analysis_data (I haven't run into issues with this yet, but I'm not using it at the same scale as Sean, see #31 ). It would also yield a graceful solution for #33 by providing us the ability to start over, storing dates as a native DATETIME rather than in ISO 8601 format as a VARCHAR.

In addition, it also gives us a new central place to store the commit message, which may be useful info.

The main changes required are:

  • Alter setup.py to move these columns out of analysis_data and into a new commits table
  • Add a clause to the function update_db in facade-worker.py to add the new commits table, copy over commit and author/committer info, remove old columns from analysis_data and optimize it, and then do a cursory walk through the git log of each repo to get full datetime info for authors/committers plus commit messages.
  • Update the caching functions with the new join between analysis_data and commits
  • Add the ability to view commit messages to various UIs
  • Cut a new major release, because this is a significant database change

While this is a big change, in theory it should be possible to do all of the changes transparently to a user with an existing database. The first facade-worker.py run after pulling this code will take longer than usual, but that's likely the only impact.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions