Skip to content

fix(diff): detect renamed and deleted files correctly in diff parsing#105

Open
adlternative wants to merge 1 commit into
alibaba:mainfrom
adlternative:adl/dev/fix-rename-warning-issue-99
Open

fix(diff): detect renamed and deleted files correctly in diff parsing#105
adlternative wants to merge 1 commit into
alibaba:mainfrom
adlternative:adl/dev/fix-rename-warning-issue-99

Conversation

@adlternative

Copy link
Copy Markdown

Summary

Fixes #99ocr review --from master --to BRANCH --format json emitted [ocr] WARNING: cannot read file <old path> at ref BRANCH: exit status 128 when a file was renamed on the target branch.

Root Cause

Two compounding bugs:

  1. Broken /dev/null detection (internal/diff/parser.go): git emits --- /dev/null / +++ /dev/null without a//b/ prefixes, but the old regexes required them, so IsNew/IsDeleted were never set. Deleted files fell through to git show ref:<old path>exit status 128.
  2. Renames never handled: the parser ignored rename from / rename to extended headers, and the git diff/show call sites did not pass --find-renames, so with diff.renames=false a rename degraded to delete(old)+add(new); the delete half hit bug 1 and warned about the old path.

Changes

  • Parse rename from/rename to (authoritative paths, robust for paths with spaces), new file mode, deleted file mode, and unprefixed /dev/null markers in ParseDiffText
  • Pass --find-renames to all 4 git diff/show invocations so rename detection no longer depends on user git config
  • Add IsRenamed to model.Diff (json: is_renamed) and prefer it in diffStatus
  • New parser_test.go: rename / pure rename (100% similarity, no hunks) / deleted / new file unit tests
  • New TestRangeDiffDetectsRename integration test: diff.renames=false + branch rename + small edit → exactly 1 diff, IsRenamed=true, content read at the new path

Testing

  • go test -race -count=1 ./... — all packages pass
  • go vet ./... and go build ./... clean

When a file was renamed on the target branch, ocr review emitted
'[ocr] WARNING: cannot read file <old path> at ref <to>: exit status 128'.

Two compounding bugs:

1. The parser required 'a/'/'b/' prefixes when matching '--- /dev/null' /
   '+++ /dev/null', but git emits these lines without prefixes, so
   IsNew/IsDeleted were never set and deleted files fell through to a
   doomed 'git show ref:<old path>'.

2. 'rename from' / 'rename to' extended headers were never parsed, and
   git diff/show call sites did not force rename detection, so renames
   degraded to delete+add whenever the user had diff.renames=false.

Fixes:
- Parse 'rename from'/'rename to', 'new file mode', 'deleted file mode'
  and unprefixed /dev/null markers in ParseDiffText.
- Pass --find-renames to all git diff/show invocations so rename
  detection no longer depends on user config.
- Add IsRenamed to model.Diff (json: is_renamed) and prefer it in
  diffStatus.
- Add parser unit tests and a range-mode rename regression test.

Fixes alibaba#99
@adlternative

Copy link
Copy Markdown
Author

End-to-End Verification Report

To confirm the fix actually resolves issue #99 (beyond unit tests), I reproduced the reported scenario and compared the pre-fix and post-fix binaries side by side.

Repro setup (mirrors the issue exactly)

git init -b master
git config diff.renames false        # key: simulates an environment where git does NOT auto-detect renames
seq 1 50 | sed 's/^/code line /' > service.go
git add . && git commit -m 'init on master'

git checkout -b SRQ-12345            # the SRQ-XXXXX branch from the issue
git mv service.go service_renamed.go # rename
sed -i 's/code line 25/code line 25 EDITED/' service_renamed.go  # small edit, like the reporter did
git add -A && git commit -m 'rename service.go'

diff.renames=false is the trigger: it makes git diff emit the rename as delete(old) + add(new), which is exactly the path that produced the warning.

Method

  • Pre-fix binary: built from main @ 022800c via git worktree
  • Post-fix binary: built from this branch
  • Both run the same command with --preview, which skips the LLM but fully exercises the diff fetch/parse path where the warning originates (finalizeDiff in internal/diff/parser.go)
ocr review --from master --to SRQ-12345 --preview

Results

Before fix (reproduces the exact warning from the issue):

[ocr] WARNING: cannot read file service.go at ref SRQ-12345: exit status 128

Preview: 2 file(s) changed  |  +50  -50
  [M]  service.go           +0   -50     <- rename misclassified as delete + add
  [M]  service_renamed.go   +50  -0

After fix:

Preview: 1 file(s) changed  |  +1  -1
  [R]  service_renamed.go   +1   -1      <- correctly recognized as a single rename, no warning

Takeaways

  • The warning is gone, and behavior is also more correct:
    • File count drops from a bogus 2 (delete + add) to 1 rename ([R] status)
    • Line stats drop from an inflated +50/-50 to the real +1/-1, avoiding feeding the entire file to the LLM as 'new' content (saves tokens and improves review accuracy)
  • This manual repro is also codified as the TestRangeDiffDetectsRename integration test (same diff.renames=false setup) to guard against regressions in CI.
  • go test -race -count=1 ./..., go vet ./..., and go build ./... all pass.

@CLAassistant

CLAassistant commented Jun 12, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exit status 128[当使用ocr review --from master --to SRQ-XXXXX --format json]

2 participants