-
Notifications
You must be signed in to change notification settings - Fork 17
Description
I wrote a sql-query to better understand the data quality of the extension Hunks. For this I'm summing up the added and removed lines of a hunk per commit and comparing it with the output of extension CommentsLOC (which parses git log --shortstat).
This is the query:
SELECT cl.commit_id as commit_id, s.rev as rev, cl.added as added, h.added as calc_added, cl.removed as removed, h.removed as calc_removed, s.message
FROM (
SELECT commit_id, SUM(old_end_line - old_start_line + 1) as removed, SUM(new_end_line - new_start_line + 1) as added
FROM hunks
GROUP BY commit_id
) as h
RIGHT JOIN commits_lines cl ON h.commit_id = cl.commit_id
JOIN scmlog s ON s.id = cl.commit_id
WHERE h.added != cl.added or h.removed != cl.removed
While investigating, why some commits don't add up, I already published some patches to increase the data quality:
- Issue [Hunks] Deleted files should also be tracked. #114: Deleted files wheren't counted correctly.
- Issue [Hunks] Bugfix for Files with Spaces in there name. #119: File names with containing spaces confused Hunks.
- Issue Better Merge parsing #116: It's also possible now to track two-way merges and file type changes (e.g. changing it to a symlink, see ).
One thing, that is really annoying, is that CommitsLOC sometimes counts wrong up to 5 lines. I investigated the issue and found, that this is a bug with git itself. I already send a bug report to the git mailing list, but so far, no answer.
Here is, what I observed with repo https://github.com/voldemort/voldemort.git :
The command git log --numstat c21ad764 shows for the commit c21ad764 and file .../readonly/mr/HadoopStoreBuilderReducer.java 25 lines added and 22 lines removed.
But the patch of HadoopStoreBuilderReducer.java that I get with git show c21ad764 -- contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/HadoopStoreBuilderReducer.java adds 30 lines and removes 27.
So 5 added and 5 removed lines are missing with git log --shortstat!
More commits where I observed this problem on the same repository:
- 7e00fb6d2cf131dfed59c180f2171952808cc336 src/java/voldemort/client/rebalance/MigratePartitions.java
- 78ad6f2a6ea327dbae2110f4530a5bd07e5deaac src/java/voldemort/client/rebalance/MigratePartitions.java (same commit on another branch)
- 7871933f0f0f056e2eeac03a01db1e9cf81f8bda src/java/voldemort/client/protocol/admin/AdminClient.java
- 2d6f68b09c3bdc23dcf3ae1f91c9285fbd668820 src/java/voldemort/store/readonly/ExternalSorter.java
- 6fcacee866307ec34eb32b268e2c2b885a949319 build.xml
Maybe someone has an idea or C skills to build a working patch for git.