Skip to content

fix: allow parsing empty context lines at the very end of unified diffs#690

Open
rtritto wants to merge 1 commit into
kpdecker:masterfrom
rtritto:empty-lines
Open

fix: allow parsing empty context lines at the very end of unified diffs#690
rtritto wants to merge 1 commit into
kpdecker:masterfrom
rtritto:empty-lines

Conversation

@rtritto
Copy link
Copy Markdown

@rtritto rtritto commented May 23, 2026

When parsing unified diff files, empty context lines (which should technically consist of a single space " ") are often inadvertently truncated to completely empty strings ("") due to code editors trimming trailing whitespace or formatting tools removing extraneous spaces.

Currently, the parsePatch (specifically parseHunk) function successfully forgives and maps these empty strings back to a space token (' '), except when the empty string happens to be the last line in the diffstr array:

// Before
const operation = (diffstr[i].length == 0 && i != (diffstr.length - 1)) ? ' ' : diffstr[i][0];

If a patch file ends with a trailing newline, uniDiff.split(/\n/) yields "" as the last element of the array. If a hunk still expects one last context line at that exact end-of-file position, i points to this final "" string. Because of the i != (diffstr.length - 1) check, it falls back to diffstr[i][0], which evaluates to undefined.

Since undefined is not a valid operation (+, -, , \), parseHunk throws an opaque error:
Error: Hunk at line X contained invalid line

Changes Made:

Removed the boundary exception && i != (diffstr.length - 1) condition. Empty strings are now globally and safely interpreted as empty context lines regardless of their index in the diff string.

// After
const operation = diffstr[i].length == 0 ? ' ' : diffstr[i][0];

Why this fixes real-world use cases:

This minor tweak prevents runtime exceptions when using applyPatch() on valid patch files generated by package managers (like yarn patch or pnpm patch) or saved via code editors, where a hunk requires an empty contextual line extending to the very end of the file.

Copilot AI review requested due to automatic review settings May 23, 2026 08:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adjusts how parsePatch determines the hunk line “operation” for empty strings while iterating through a unified diff.

Changes:

  • Simplifies operation detection by treating empty lines as context (' ') unconditionally.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/patch/parse.ts
Comment on lines +470 to 472
const operation = diffstr[i].length == 0 ? ' ' : diffstr[i][0];
if (operation === '+' || operation === '-' || operation === ' ' || operation === '\\') {
hunk.lines.push(diffstr[i]);
@rtritto
Copy link
Copy Markdown
Author

rtritto commented May 23, 2026

FYI @ExplodingCabbage

@ExplodingCabbage
Copy link
Copy Markdown
Collaborator

This minor tweak prevents runtime exceptions when using applyPatch() on valid patch files generated by package managers (like yarn patch

I can't repro this; if I use yarn patch (with Yarn 4.12.0, which is what jsdiff is currently on) to generate a patch that ends with a blank context line, yarn generates a patch whose last three characters are newline-space-newline, as you'd expect.

Do you have exact repro steps to cause an existing patch-generating tool outputting a patch that applyPatch can't handle without this fix? (Let me know what versions you're using since this might be relevant.)

(It may be worth applying this change even if you are mistaken about yarn and pnpm, for consistency with non-trailing empty context lines and to handle the case where trimming is done by a text editor. I'll have a look and think about it. But would like to understand either way whether there really is an even stronger case for this based on yarn and pnpm emitting patches we currently consider malformed, as you suggest.)

@ExplodingCabbage
Copy link
Copy Markdown
Collaborator

Hmm... doesn't this change actually have the effect of letting us imagine an empty context line AFTER the final newline of a patch? That is, it's not letting us parse a patch like this, which actually we can already parse fine (formatted here as a JS template string, for ease of seeing exactly what newlines are present)...

`--- test1.txt
+++ test2.txt
@@ -1,5 +1,5 @@
 line1

-line3
+line3b
 line4

`

but is actually letting us parse one like THIS:

`--- test1.txt
+++ test2.txt
@@ -1,5 +1,5 @@
 line1

-line3
+line3b
 line4
`

But... there literally isn't a line after line4\n in the patch above. We're not forgiving a blank context line lacking the trailing space; we're imagining an extra blank context line where there was literally no line at all.

I can see the argument for why this makes sense to do given that we already

  1. forgive the absence of a leading space on a blank context line and also
  2. forgive the absence of a trailing newline on the last line of a patch

This PR effectively modifies our handling of the scenario where a patch is malformed in BOTH of these ways at once (i.e. where there should've been a final " \n" at the end of the patch, and both the space AND the newline have been truncated). Arguably this is a consistency improvement, I guess, since it's just allowing the two existing error-forgiveness behaviours to occur simultaneously. But I think adding an entire extra line to a hunk that was not in any sense actually included is a more radical kind of tolerance of malformed patches than jsdiff has done before now and am not sure I'm comfortable with it; it has the effect that by just incrementing the line counts in the final hunk header, you can cause jsdiff to hallucinate an extra context line at the end of the hunk (rather than complaining that your hunk is missing a line, like you might expect it to - admittedly the existing error message is NOT helpful in this scenario).

Am I understanding correctly, do you think, @rtritto? I'd welcome your thoughts.

@rtritto
Copy link
Copy Markdown
Author

rtritto commented May 26, 2026

@ExplodingCabbage thanks for reply, my use case (reproduction) is with applyPatch (from jsdiff) which applies this patch (generated by yarn or pnpm) to @universal-deploy/vite@0.1.9 package:

--- a/dist/index.js
+++ b/dist/index.js
@@ -121,11 +121,16 @@ function auto(options) {
     return [
         catchAll(),
         devServer(),
-		...node$1(options?.node).map((p) => {
-			p[INSTANCE] = instance;
-			return enablePluginIf((config) => noDeploymentTargetFound(p, config), p);
-		}),
-		...netlifyGlue()
+		{
+			name: "ud:target:emit",
+			apply: "build",
+			config: {
+				order: "post",
+				handler() {
+					return { environments: { ssr: { build: { rolldownOptions: { input: { index: './src/server/entrypoint.ts' } } } } } }
+				}
+			}
+		}
     ];
 }

Output:

file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/parse.js:478
                throw new Error(`Hunk at line ${chunkHeaderIndex + 1} contained invalid line ${diffstr[i]}`);
                      ^

Error: Hunk at line 3 contained invalid line 
    at parseHunk (file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/parse.js:478:23)
    at parseIndex (file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/parse.js:190:34)
    at parsePatch (file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/parse.js:509:9)
    at applyPatch (file:///C:/Users/<USER>/AppData/Local/Yarn/Berry/cache/diff-npm-9.0.0-7def25b473-10c0.zip/node_modules/diff/libesm/patch/apply.js:30:19)
    at applyPatchCore (file:///C:/<MY_PROJECT>/src/applyPatchCore.ts:30:25)
    at file:///C:/<MY_PROJECT>/src/applyPatch.ts:6:11
    at withPatchLifecycle (file:///C:/<MY_PROJECT>/src/utils.ts:39:9)
    at applyPatch (file:///C:/<MY_PROJECT>/src/applyPatch.ts:5:9)
    at file:///C:/<MY_PROJECT>/scripts/patcher.ts:3:7
    at ModuleJob.run (node:internal/modules/esm/module_job:439:25)

Workaround: change @@ -121,11 +121,16 @@ to @@ -121,10 +121,15 @@

PS: tell me if you need some further help

@rtritto
Copy link
Copy Markdown
Author

rtritto commented May 26, 2026

Am I understanding correctly, do you think, @rtritto? I'd welcome your thoughts.

You are spot on – your understanding is exactly correct. The PR does indeed allow jsdiff to effectively "hallucinate" an extra context line at the end of the hunk, basically forgiving a patch that is missing both its trailing space and its trailing newline.

To give you some context on why I proposed this: it addresses a very frustrating and common real-world edge case when using ecosystem tools like yarn patch.
When a patch logically ends with a blank context line (" \n"), developers often open the .patch file in an editor (like VS Code) to make a quick manual tweak. Default editor configurations like "Trim Trailing Whitespace" and "Trim Final Newlines" will quietly strip both the space and potentially the entire final empty line upon saving.

When this slightly truncated patch is fed back into jsdiff, it hits the EOF and throws:
Error: Hunk at line X contained invalid line
Because it's evaluating the final "" string from the .split(/\n/), the error is extremely cryptic (yielding undefined). Developers could waste hours trying to find a syntax error, when in reality the hunk is just missing its final empty line.

Standard GNU patch is famously lenient with truncated contexts at the end of files, and my goal was to mimic that resilience. That said, I completely respect your hesitation. Synthesizing a line purely to satisfy the hunk header's expected line count is a more radical kind of tolerance.

If you feel this level of leniency is too risky or outside the scope of jsdiff, I completely understand. In that case, would you be open to an alternative approach where we simply catch this EOF scenario and throw a much better error message?
Something like Hunk at line X ended prematurely (expected N more lines) instead of trying to evaluate diffstr[i][0]? Even just improving the error message would turn a huge headache into an easily actionable fixable problem for users.

WDYT?

@ExplodingCabbage
Copy link
Copy Markdown
Collaborator

I'm still confused: do Yarn or pnpm ever directly output patches where the trailing empty context line is chopped off, or is the issue you've experienced limited to scenarios where someone opens and saves the patch in their editor and the editor truncates it? Your first post seemed to be stating that yarn/pnpm directly output such patches - note the "or" I bold below:

This minor tweak prevents runtime exceptions when using applyPatch() on valid patch files generated by package managers (like yarn patch or pnpm patch) or saved via code editors

but in your post above it sounds like the problem is actually limited to editors trimming the patches and you mention Yarn and pnpm patches just because you figure they're particularly likely to be opened and saved in an editor?

Standard GNU patch is famously lenient with truncated contexts at the end of files, and my goal was to mimic that resilience.

Interesting - I'll check this. Consistency with patch might be a good argument for making jsdiff more lenient.

Even just improving the error message would turn a huge headache into an easily actionable fixable problem for users.

Yes, I think I'll do this at a minimum. The error message we currently emit is wrong and confusing.

@rtritto
Copy link
Copy Markdown
Author

rtritto commented May 27, 2026

Yarn and pnpm do generate fully compliant, well-formed patch files out of the box.

The issue arises exclusively from the second step of the modern workflow: when developers open those .patch files in their IDE. Because tools like yarn patch and pnpm patch persist these files statically in the repository (e.g., in a patches/ folder), developers frequently open them manually to resolve Git merge conflicts, tweak a hardcoded path, or remove a specific hunk.
When they hit "Save", editor settings like "Trim Trailing Whitespace" and "Insert/Trim Final Newline" silently corrupt the empty context lines at the end of the file. So you are completely right: the package managers aren't outputting broken patches, but their workflow exposes these files to the editor auto-formatting.

In short regarding the example:

  • yarn and pnpm outputs the .patch with @@ -121,11 +121,16 @@
  • jsdiff using patch (generated by yarn and pnpm) with @@ -121,11 +121,16 @@ throws hunk error; jsdiff works with workaround @@ -121,10 +121,15 @@

@ExplodingCabbage
Copy link
Copy Markdown
Collaborator

Re the behaviour of the Unix patch util - it's actually weirder than I'd ever have guessed. It will hallucinate up to THREE blank context lines at the end of a patch that aren't actually present in the patch file - but no more than three. So this is a legit patch as far as patch is concerned:

--- test1.txt
+++ test2.txt
@@ -1,7 +1,7 @@
 line1
 
-line3
+line3b
 line4

... but this one is not:

--- test1.txt
+++ test2.txt
@@ -1,8 +1,8 @@
 line1
 
-line3
+line3b
 line4

... and will provoke this error if you try to apply it:

patch: **** unexpected end of file in patch

This seems bizarrely arbitrary to me? On most matters besides fuzzy matching, jsdiff aligns with diff and patch, but this one just seems really odd.

@ExplodingCabbage
Copy link
Copy Markdown
Collaborator

I tentatively decide that I'll improve the error message but keep this as an error. I'm torn about what the better behaviour is in principle, and so I'm inclined to be conservative and avoid changing it.

@rtritto
Copy link
Copy Markdown
Author

rtritto commented May 27, 2026

This seems bizarrely arbitrary to me? On most matters besides fuzzy matching, jsdiff aligns with diff and patch, but this one just seems really odd.

That is bizarre! I had no idea Unix patch hardcoded a mysterious 3-line limit for hallucinated context. That seems completely arbitrary and feels more like a legacy quirk than a deliberate design choice we should copy.

What about a flag like a strict option?

diff --git a/dist/diff.js b/dist/diff.js
index 28f5149a413ec160b0d0780e112d7d1236f75d9d..0f3952f9157a6ed0f6ccf3f8866054bbcc928467 100644
--- a/dist/diff.js
+++ b/dist/diff.js
@@ -1013,7 +1013,7 @@
      * `oldFileName` and `newFileName` may be `undefined` if the patch doesn't contain enough
      * information to determine them (e.g. a hunk-only patch with no file headers).
      */
-    function parsePatch(uniDiff) {
+    function parsePatch(uniDiff, options = {}) {
         const diffstr = uniDiff.split(/\n/), list = [];
         let i = 0;
         // These helper functions identify line types that can appear between files
@@ -1462,7 +1462,7 @@
             }
             let addCount = 0, removeCount = 0;
             for (; i < diffstr.length && (removeCount < hunk.oldLines || addCount < hunk.newLines || ((_a = diffstr[i]) === null || _a === void 0 ? void 0 : _a.startsWith('\\'))); i++) {
-                const operation = (diffstr[i].length == 0 && i != (diffstr.length - 1)) ? ' ' : diffstr[i][0];
+                const operation = (diffstr[i].length == 0 && (options.strict === false || i != (diffstr.length - 1))) ? ' ' : diffstr[i][0];
                 if (operation === '+' || operation === '-' || operation === ' ' || operation === '\\') {
                     hunk.lines.push(diffstr[i]);
                     if (operation === '+') {
@@ -1576,7 +1576,7 @@
     function applyPatch(source, patch, options = {}) {
         let patches;
         if (typeof patch === 'string') {
-            patches = parsePatch(patch);
+            patches = parsePatch(patch, options);
         }
         else if (Array.isArray(patch)) {
             patches = patch;
diff --git a/libcjs/patch/apply.js b/libcjs/patch/apply.js
index 610a072ca3e33ba48fdfc98e9302cbd6611c3ec0..bd24ac2a71095661c8d7a3d88c65a31881881494 100644
--- a/libcjs/patch/apply.js
+++ b/libcjs/patch/apply.js
@@ -34,7 +34,7 @@ const distance_iterator_js_1 = __importDefault(require("../util/distance-iterato
 function applyPatch(source, patch, options = {}) {
     let patches;
     if (typeof patch === 'string') {
-        patches = (0, parse_js_1.parsePatch)(patch);
+        patches = (0, parse_js_1.parsePatch)(patch, options);
     }
     else if (Array.isArray(patch)) {
         patches = patch;
diff --git a/libcjs/patch/parse.js b/libcjs/patch/parse.js
index f5900303dcc4088c964e3dd088e159ca918f85e3..37adafab39495180f51f7e1f2dc7108c08607b2e 100644
--- a/libcjs/patch/parse.js
+++ b/libcjs/patch/parse.js
@@ -14,7 +14,7 @@ exports.parsePatch = parsePatch;
  * `oldFileName` and `newFileName` may be `undefined` if the patch doesn't contain enough
  * information to determine them (e.g. a hunk-only patch with no file headers).
  */
-function parsePatch(uniDiff) {
+function parsePatch(uniDiff, options = {}) {
     const diffstr = uniDiff.split(/\n/), list = [];
     let i = 0;
     // These helper functions identify line types that can appear between files
@@ -463,7 +463,7 @@ function parsePatch(uniDiff) {
         }
         let addCount = 0, removeCount = 0;
         for (; i < diffstr.length && (removeCount < hunk.oldLines || addCount < hunk.newLines || ((_a = diffstr[i]) === null || _a === void 0 ? void 0 : _a.startsWith('\\'))); i++) {
-            const operation = (diffstr[i].length == 0 && i != (diffstr.length - 1)) ? ' ' : diffstr[i][0];
+            const operation = (diffstr[i].length == 0 && (options.strict === false || i != (diffstr.length - 1))) ? ' ' : diffstr[i][0];
             if (operation === '+' || operation === '-' || operation === ' ' || operation === '\\') {
                 hunk.lines.push(diffstr[i]);
                 if (operation === '+') {
diff --git a/libesm/patch/apply.js b/libesm/patch/apply.js
index fe2e8db5c465d27796c0a76d71e6bb847168cb6f..ecef4473772342caa882d1b24c96ee9d72158475 100644
--- a/libesm/patch/apply.js
+++ b/libesm/patch/apply.js
@@ -27,7 +27,7 @@ import distanceIterator from '../util/distance-iterator.js';
 export function applyPatch(source, patch, options = {}) {
     let patches;
     if (typeof patch === 'string') {
-        patches = parsePatch(patch);
+        patches = parsePatch(patch, options);
     }
     else if (Array.isArray(patch)) {
         patches = patch;
diff --git a/libesm/patch/parse.js b/libesm/patch/parse.js
index deb98f26d6a01486932aef961672088772c9f695..412e61b819aab66210dbe0763da2cc079c357685 100644
--- a/libesm/patch/parse.js
+++ b/libesm/patch/parse.js
@@ -11,7 +11,7 @@
  * `oldFileName` and `newFileName` may be `undefined` if the patch doesn't contain enough
  * information to determine them (e.g. a hunk-only patch with no file headers).
  */
-export function parsePatch(uniDiff) {
+export function parsePatch(uniDiff, options = {}) {
     const diffstr = uniDiff.split(/\n/), list = [];
     let i = 0;
     // These helper functions identify line types that can appear between files
@@ -460,7 +460,7 @@ export function parsePatch(uniDiff) {
         }
         let addCount = 0, removeCount = 0;
         for (; i < diffstr.length && (removeCount < hunk.oldLines || addCount < hunk.newLines || ((_a = diffstr[i]) === null || _a === void 0 ? void 0 : _a.startsWith('\\'))); i++) {
-            const operation = (diffstr[i].length == 0 && i != (diffstr.length - 1)) ? ' ' : diffstr[i][0];
+            const operation = (diffstr[i].length == 0 && (options.strict === false || i != (diffstr.length - 1))) ? ' ' : diffstr[i][0];
             if (operation === '+' || operation === '-' || operation === ' ' || operation === '\\') {
                 hunk.lines.push(diffstr[i]);
                 if (operation === '+') {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants