Skip to content

git filter-repo --analyze fails on non-existent commit that no Git tool can find #711

@maxlrainer

Description

@maxlrainer

Summary

git filter-repo --analyze fails with "fatal: unable to read ea5ee0839a0c30442f568e9672dd614a988a39c6" after processing ~1.5M commits in a large mono-repo. The critical issue: this commit doesn't exist in the repository and no Git command can find any reference to it, yet filter-repo consistently attempts to read it.

Error Message

Processed 7971661 blob sizes
Processed 1596704 commitsfatal: unable to read ea5ee0839a0c30442f568e9672dd614a988a39c6
Processed 1596737 commits
Error: rev-list|diff-tree pipeline failed; see above.

Key Findings

Reproducible: Error occurs consistently at the same commit count with the same SHA
Object doesn't exist: git cat-file -t ea5ee0839a0c30442f568e9672dd614a988a39c6 fails
Git reports healthy: git fsck --full completes with no errors or warnings
No Git command finds it: Tested extensively with no results:

  • git rev-list --all --objects | grep ea5ee... (empty)
  • git log --all --format="%H %P" | grep ea5ee... (empty)
  • git verify-pack -v .git/objects/pack/*.idx | grep ea5ee... (empty)
  • git reflog --all | grep ea5ee... (empty)
  • grep -r ea5ee... .git/ (no matches)

Other tools work: BFG Repo-Cleaner runs successfully on the same repository
Not a clone issue: Fresh clone, not shallow (git rev-parse --is-shallow-repository = false)

Environment

  • Repository: Large mono-repo (~1.5M commits, ~8M blobs)
  • Command: git filter-repo --analyze
  • Repository state: Fresh clone

Questions for Maintainers

  1. How does filter-repo find this commit? What mechanism is it using that differs from standard Git commands like git rev-list, git fsck, and git log?

  2. Could this be a filter-repo internal state issue? Since no Git tool can locate this SHA, could filter-repo be generating or caching it internally?

  3. Why does BFG succeed? What does BFG do differently in traversing the repository that allows it to handle this scenario?

  4. Workaround? Are there any flags or options to make filter-repo skip missing objects or be more lenient with repository inconsistencies?


This appears to be an edge case where filter-repo's repository traversal encounters a commit reference that Git's own integrity checking doesn't flag as problematic. Any guidance would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions