-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Per discussion with @sgoggins, this issue proposes some major structural changes to the way commit data is stored. It was triggered by discussion around #33
For a while I have wanted to optimize analysis_data. Each row in the table contains info on each file that changed in a commit. Each row also contains its own copy of author and committer info. When a commit changes a single file, it's not really a big deal. But when a commit changes a lot of files, there's a lot of duplication in the metadata.
There is some benefit in breaking this info out into a separate table, called commits. It would reduce the overall size of analysis_data (I haven't run into issues with this yet, but I'm not using it at the same scale as Sean, see #31 ). It would also yield a graceful solution for #33 by providing us the ability to start over, storing dates as a native DATETIME rather than in ISO 8601 format as a VARCHAR.
In addition, it also gives us a new central place to store the commit message, which may be useful info.
The main changes required are:
- Alter setup.py to move these columns out of
analysis_dataand into a newcommitstable - Add a clause to the function update_db in
facade-worker.pyto add the newcommitstable, copy over commit and author/committer info, remove old columns fromanalysis_dataand optimize it, and then do a cursory walk through the git log of each repo to get full datetime info for authors/committers plus commit messages. - Update the caching functions with the new join between
analysis_dataandcommits - Add the ability to view commit messages to various UIs
- Cut a new major release, because this is a significant database change
While this is a big change, in theory it should be possible to do all of the changes transparently to a user with an existing database. The first facade-worker.py run after pulling this code will take longer than usual, but that's likely the only impact.