Skip to content

Extension CommitsLOC sometimes counts wrong. #120

@apepper

Description

@apepper

I wrote a sql-query to better understand the data quality of the extension Hunks. For this I'm summing up the added and removed lines of a hunk per commit and comparing it with the output of extension CommentsLOC (which parses git log --shortstat).

This is the query:

SELECT cl.commit_id as commit_id, s.rev as rev, cl.added as added, h.added as calc_added, cl.removed as removed, h.removed as calc_removed, s.message
FROM (
    SELECT commit_id, SUM(old_end_line - old_start_line + 1) as removed, SUM(new_end_line - new_start_line + 1) as added
    FROM hunks
    GROUP BY commit_id
  ) as h
RIGHT JOIN commits_lines cl ON h.commit_id = cl.commit_id
JOIN scmlog s ON s.id = cl.commit_id
WHERE h.added != cl.added or h.removed != cl.removed

While investigating, why some commits don't add up, I already published some patches to increase the data quality:


One thing, that is really annoying, is that CommitsLOC sometimes counts wrong up to 5 lines. I investigated the issue and found, that this is a bug with git itself. I already send a bug report to the git mailing list, but so far, no answer.

Here is, what I observed with repo https://github.com/voldemort/voldemort.git :
The command git log --numstat c21ad764 shows for the commit c21ad764 and file .../readonly/mr/HadoopStoreBuilderReducer.java 25 lines added and 22 lines removed.
But the patch of HadoopStoreBuilderReducer.java that I get with git show c21ad764 -- contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/HadoopStoreBuilderReducer.java adds 30 lines and removes 27.

So 5 added and 5 removed lines are missing with git log --shortstat!

More commits where I observed this problem on the same repository:

  • 7e00fb6d2cf131dfed59c180f2171952808cc336 src/java/voldemort/client/rebalance/MigratePartitions.java
  • 78ad6f2a6ea327dbae2110f4530a5bd07e5deaac src/java/voldemort/client/rebalance/MigratePartitions.java (same commit on another branch)
  • 7871933f0f0f056e2eeac03a01db1e9cf81f8bda src/java/voldemort/client/protocol/admin/AdminClient.java
  • 2d6f68b09c3bdc23dcf3ae1f91c9285fbd668820 src/java/voldemort/store/readonly/ExternalSorter.java
  • 6fcacee866307ec34eb32b268e2c2b885a949319 build.xml

Maybe someone has an idea or C skills to build a working patch for git.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions