fix: allow parsing empty context lines at the very end of unified diffs#690
fix: allow parsing empty context lines at the very end of unified diffs#690rtritto wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adjusts how parsePatch determines the hunk line “operation” for empty strings while iterating through a unified diff.
Changes:
- Simplifies operation detection by treating empty lines as context (
' ') unconditionally.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const operation = diffstr[i].length == 0 ? ' ' : diffstr[i][0]; | ||
| if (operation === '+' || operation === '-' || operation === ' ' || operation === '\\') { | ||
| hunk.lines.push(diffstr[i]); |
I can't repro this; if I use Do you have exact repro steps to cause an existing patch-generating tool outputting a patch that (It may be worth applying this change even if you are mistaken about yarn and pnpm, for consistency with non-trailing empty context lines and to handle the case where trimming is done by a text editor. I'll have a look and think about it. But would like to understand either way whether there really is an even stronger case for this based on yarn and pnpm emitting patches we currently consider malformed, as you suggest.) |
|
Hmm... doesn't this change actually have the effect of letting us imagine an empty context line AFTER the final newline of a patch? That is, it's not letting us parse a patch like this, which actually we can already parse fine (formatted here as a JS template string, for ease of seeing exactly what newlines are present)... but is actually letting us parse one like THIS: But... there literally isn't a line after I can see the argument for why this makes sense to do given that we already
This PR effectively modifies our handling of the scenario where a patch is malformed in BOTH of these ways at once (i.e. where there should've been a final Am I understanding correctly, do you think, @rtritto? I'd welcome your thoughts. |
|
@ExplodingCabbage thanks for reply, my use case (reproduction) is with --- a/dist/index.js
+++ b/dist/index.js
@@ -121,11 +121,16 @@ function auto(options) {
return [
catchAll(),
devServer(),
- ...node$1(options?.node).map((p) => {
- p[INSTANCE] = instance;
- return enablePluginIf((config) => noDeploymentTargetFound(p, config), p);
- }),
- ...netlifyGlue()
+ {
+ name: "ud:target:emit",
+ apply: "build",
+ config: {
+ order: "post",
+ handler() {
+ return { environments: { ssr: { build: { rolldownOptions: { input: { index: './src/server/entrypoint.ts' } } } } } }
+ }
+ }
+ }
];
}Output: file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/parse.js:478
throw new Error(`Hunk at line ${chunkHeaderIndex + 1} contained invalid line ${diffstr[i]}`);
^
Error: Hunk at line 3 contained invalid line
at parseHunk (file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/parse.js:478:23)
at parseIndex (file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/parse.js:190:34)
at parsePatch (file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/parse.js:509:9)
at applyPatch (file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/apply.js:30:19)
at applyPatchCore (file:///C:/<MY_PROJECT>/src/applyPatchCore.ts:30:25)
at file:///C:/<MY_PROJECT>/src/applyPatch.ts:6:11
at withPatchLifecycle (file:///C:/<MY_PROJECT>/src/utils.ts:39:9)
at applyPatch (file:///C:/<MY_PROJECT>/src/applyPatch.ts:5:9)
at file:///C:/<MY_PROJECT>/scripts/patcher.ts:3:7
at ModuleJob.run (node:internal/modules/esm/module_job:439:25)Workaround: change PS: tell me if you need some further help |
You are spot on – your understanding is exactly correct. The PR does indeed allow To give you some context on why I proposed this: it addresses a very frustrating and common real-world edge case when using ecosystem tools like When this slightly truncated patch is fed back into Standard GNU If you feel this level of leniency is too risky or outside the scope of WDYT? |
|
I'm still confused: do Yarn or pnpm ever directly output patches where the trailing empty context line is chopped off, or is the issue you've experienced limited to scenarios where someone opens and saves the patch in their editor and the editor truncates it? Your first post seemed to be stating that yarn/pnpm directly output such patches - note the "or" I bold below:
but in your post above it sounds like the problem is actually limited to editors trimming the patches and you mention Yarn and pnpm patches just because you figure they're particularly likely to be opened and saved in an editor?
Interesting - I'll check this. Consistency with
Yes, I think I'll do this at a minimum. The error message we currently emit is wrong and confusing. |
|
Yarn and pnpm do generate fully compliant, well-formed patch files out of the box. The issue arises exclusively from the second step of the modern workflow: when developers open those In short regarding the example:
|
|
Re the behaviour of the Unix --- test1.txt
+++ test2.txt
@@ -1,7 +1,7 @@
line1
-line3
+line3b
line4... but this one is not: --- test1.txt
+++ test2.txt
@@ -1,8 +1,8 @@
line1
-line3
+line3b
line4... and will provoke this error if you try to apply it: This seems bizarrely arbitrary to me? On most matters besides fuzzy matching, jsdiff aligns with |
|
I tentatively decide that I'll improve the error message but keep this as an error. I'm torn about what the better behaviour is in principle, and so I'm inclined to be conservative and avoid changing it. |
That is bizarre! I had no idea Unix What about a flag like a diff --git a/dist/diff.js b/dist/diff.js
index 28f5149a413ec160b0d0780e112d7d1236f75d9d..0f3952f9157a6ed0f6ccf3f8866054bbcc928467 100644
--- a/dist/diff.js
+++ b/dist/diff.js
@@ -1013,7 +1013,7 @@
* `oldFileName` and `newFileName` may be `undefined` if the patch doesn't contain enough
* information to determine them (e.g. a hunk-only patch with no file headers).
*/
- function parsePatch(uniDiff) {
+ function parsePatch(uniDiff, options = {}) {
const diffstr = uniDiff.split(/\n/), list = [];
let i = 0;
// These helper functions identify line types that can appear between files
@@ -1462,7 +1462,7 @@
}
let addCount = 0, removeCount = 0;
for (; i < diffstr.length && (removeCount < hunk.oldLines || addCount < hunk.newLines || ((_a = diffstr[i]) === null || _a === void 0 ? void 0 : _a.startsWith('\\'))); i++) {
- const operation = (diffstr[i].length == 0 && i != (diffstr.length - 1)) ? ' ' : diffstr[i][0];
+ const operation = (diffstr[i].length == 0 && (options.strict === false || i != (diffstr.length - 1))) ? ' ' : diffstr[i][0];
if (operation === '+' || operation === '-' || operation === ' ' || operation === '\\') {
hunk.lines.push(diffstr[i]);
if (operation === '+') {
@@ -1576,7 +1576,7 @@
function applyPatch(source, patch, options = {}) {
let patches;
if (typeof patch === 'string') {
- patches = parsePatch(patch);
+ patches = parsePatch(patch, options);
}
else if (Array.isArray(patch)) {
patches = patch;
diff --git a/libcjs/patch/apply.js b/libcjs/patch/apply.js
index 610a072ca3e33ba48fdfc98e9302cbd6611c3ec0..bd24ac2a71095661c8d7a3d88c65a31881881494 100644
--- a/libcjs/patch/apply.js
+++ b/libcjs/patch/apply.js
@@ -34,7 +34,7 @@ const distance_iterator_js_1 = __importDefault(require("../util/distance-iterato
function applyPatch(source, patch, options = {}) {
let patches;
if (typeof patch === 'string') {
- patches = (0, parse_js_1.parsePatch)(patch);
+ patches = (0, parse_js_1.parsePatch)(patch, options);
}
else if (Array.isArray(patch)) {
patches = patch;
diff --git a/libcjs/patch/parse.js b/libcjs/patch/parse.js
index f5900303dcc4088c964e3dd088e159ca918f85e3..37adafab39495180f51f7e1f2dc7108c08607b2e 100644
--- a/libcjs/patch/parse.js
+++ b/libcjs/patch/parse.js
@@ -14,7 +14,7 @@ exports.parsePatch = parsePatch;
* `oldFileName` and `newFileName` may be `undefined` if the patch doesn't contain enough
* information to determine them (e.g. a hunk-only patch with no file headers).
*/
-function parsePatch(uniDiff) {
+function parsePatch(uniDiff, options = {}) {
const diffstr = uniDiff.split(/\n/), list = [];
let i = 0;
// These helper functions identify line types that can appear between files
@@ -463,7 +463,7 @@ function parsePatch(uniDiff) {
}
let addCount = 0, removeCount = 0;
for (; i < diffstr.length && (removeCount < hunk.oldLines || addCount < hunk.newLines || ((_a = diffstr[i]) === null || _a === void 0 ? void 0 : _a.startsWith('\\'))); i++) {
- const operation = (diffstr[i].length == 0 && i != (diffstr.length - 1)) ? ' ' : diffstr[i][0];
+ const operation = (diffstr[i].length == 0 && (options.strict === false || i != (diffstr.length - 1))) ? ' ' : diffstr[i][0];
if (operation === '+' || operation === '-' || operation === ' ' || operation === '\\') {
hunk.lines.push(diffstr[i]);
if (operation === '+') {
diff --git a/libesm/patch/apply.js b/libesm/patch/apply.js
index fe2e8db5c465d27796c0a76d71e6bb847168cb6f..ecef4473772342caa882d1b24c96ee9d72158475 100644
--- a/libesm/patch/apply.js
+++ b/libesm/patch/apply.js
@@ -27,7 +27,7 @@ import distanceIterator from '../util/distance-iterator.js';
export function applyPatch(source, patch, options = {}) {
let patches;
if (typeof patch === 'string') {
- patches = parsePatch(patch);
+ patches = parsePatch(patch, options);
}
else if (Array.isArray(patch)) {
patches = patch;
diff --git a/libesm/patch/parse.js b/libesm/patch/parse.js
index deb98f26d6a01486932aef961672088772c9f695..412e61b819aab66210dbe0763da2cc079c357685 100644
--- a/libesm/patch/parse.js
+++ b/libesm/patch/parse.js
@@ -11,7 +11,7 @@
* `oldFileName` and `newFileName` may be `undefined` if the patch doesn't contain enough
* information to determine them (e.g. a hunk-only patch with no file headers).
*/
-export function parsePatch(uniDiff) {
+export function parsePatch(uniDiff, options = {}) {
const diffstr = uniDiff.split(/\n/), list = [];
let i = 0;
// These helper functions identify line types that can appear between files
@@ -460,7 +460,7 @@ export function parsePatch(uniDiff) {
}
let addCount = 0, removeCount = 0;
for (; i < diffstr.length && (removeCount < hunk.oldLines || addCount < hunk.newLines || ((_a = diffstr[i]) === null || _a === void 0 ? void 0 : _a.startsWith('\\'))); i++) {
- const operation = (diffstr[i].length == 0 && i != (diffstr.length - 1)) ? ' ' : diffstr[i][0];
+ const operation = (diffstr[i].length == 0 && (options.strict === false || i != (diffstr.length - 1))) ? ' ' : diffstr[i][0];
if (operation === '+' || operation === '-' || operation === ' ' || operation === '\\') {
hunk.lines.push(diffstr[i]);
if (operation === '+') { |
When parsing unified diff files, empty context lines (which should technically consist of a single space
" ") are often inadvertently truncated to completely empty strings ("") due to code editors trimming trailing whitespace or formatting tools removing extraneous spaces.Currently, the
parsePatch(specificallyparseHunk) function successfully forgives and maps these empty strings back to a space token (' '), except when the empty string happens to be the last line in thediffstrarray:If a patch file ends with a trailing newline,
uniDiff.split(/\n/)yields""as the last element of the array. If a hunk still expects one last context line at that exact end-of-file position,ipoints to this final""string. Because of thei != (diffstr.length - 1)check, it falls back todiffstr[i][0], which evaluates toundefined.Since
undefinedis not a valid operation (+,-,,\),parseHunkthrows an opaque error:Error: Hunk at line X contained invalid lineChanges Made:
Removed the boundary exception
&& i != (diffstr.length - 1)condition. Empty strings are now globally and safely interpreted as empty context lines regardless of their index in the diff string.Why this fixes real-world use cases:
This minor tweak prevents runtime exceptions when using
applyPatch()on valid patch files generated by package managers (likeyarn patchorpnpm patch) or saved via code editors, where a hunk requires an empty contextual line extending to the very end of the file.