Skip to content

Commit 5bd10b2

Browse files
jltoblergitster
authored andcommitted
builtin: introduce diff-pairs command
Through git-diff(1), a single diff can be generated from a pair of blob revisions directly. Unfortunately, there is not a mechanism to compute batches of specific file pair diffs in a single process. Such a feature is particularly useful on the server-side where diffing between a large set of changes is not feasible all at once due to timeout concerns. To facilitate this, introduce git-diff-pairs(1) which acts as a backend passing its NUL-terminated raw diff format input from stdin through diff machinery to produce various forms of output such as patch or raw. The raw format was originally designed as an interchange format and represents the contents of the diff_queued_diff list making it possible to break the diff pipeline into separate stages. For example, git-diff-tree(1) can be used as a frontend to compute file pairs to queue and feed its raw output to git-diff-pairs(1) to compute patches. With this, batches of diffs can be progressively generated without having to recompute renames or retrieve object context. Something like the following: git diff-tree -r -z -M $old $new | git diff-pairs -p -z should generate the same output as `git diff-tree -p -M`. Furthermore, each line of raw diff formatted input can also be individually fed to a separate git-diff-pairs(1) process and still produce the same output. Based-on-patch-by: Jeff King <[email protected]> Signed-off-by: Justin Tobler <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent c8a8e04 commit 5bd10b2

File tree

11 files changed

+338
-0
lines changed

11 files changed

+338
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@
5555
/git-diff
5656
/git-diff-files
5757
/git-diff-index
58+
/git-diff-pairs
5859
/git-diff-tree
5960
/git-difftool
6061
/git-difftool--helper

Documentation/git-diff-pairs.adoc

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
git-diff-pairs(1)
2+
=================
3+
4+
NAME
5+
----
6+
git-diff-pairs - Compare the content and mode of provided blob pairs
7+
8+
SYNOPSIS
9+
--------
10+
[synopsis]
11+
git diff-pairs -z [<diff-options>]
12+
13+
DESCRIPTION
14+
-----------
15+
Show changes for file pairs provided on stdin. Input for this command must be
16+
in the NUL-terminated raw output format as generated by commands such as `git
17+
diff-tree -z -r --raw`. By default, the outputted diffs are computed and shown
18+
in the patch format when stdin closes.
19+
20+
Usage of this command enables the traditional diff pipeline to be broken up
21+
into separate stages where `diff-pairs` acts as the output phase. Other
22+
commands, such as `diff-tree`, may serve as a frontend to compute the raw
23+
diff format used as input.
24+
25+
Instead of computing diffs via `git diff-tree -p -M` in one step, `diff-tree`
26+
can compute the file pairs and rename information without the blob diffs. This
27+
output can be fed to `diff-pairs` to generate the underlying blob diffs as done
28+
in the following example:
29+
30+
-----------------------------
31+
git diff-tree -z -r -M $a $b |
32+
git diff-pairs -z
33+
-----------------------------
34+
35+
Computing the tree diff upfront with rename information allows patch output
36+
from `diff-pairs` to be progressively computed over the course of potentially
37+
multiple invocations.
38+
39+
Pathspecs are not currently supported by `diff-pairs`. Pathspec limiting should
40+
be performed by the upstream command generating the raw diffs used as input.
41+
42+
Tree objects are not currently supported as input and are rejected.
43+
44+
Abbreviated object IDs in the `diff-pairs` input are not supported. Outputted
45+
object IDs can be abbreviated using the `--abbrev` option.
46+
47+
OPTIONS
48+
-------
49+
50+
include::diff-options.adoc[]
51+
52+
include::diff-generate-patch.adoc[]
53+
54+
GIT
55+
---
56+
Part of the linkgit:git[1] suite

Documentation/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ manpages = {
4242
'git-diagnose.adoc' : 1,
4343
'git-diff-files.adoc' : 1,
4444
'git-diff-index.adoc' : 1,
45+
'git-diff-pairs.adoc' : 1,
4546
'git-difftool.adoc' : 1,
4647
'git-diff-tree.adoc' : 1,
4748
'git-diff.adoc' : 1,

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1242,6 +1242,7 @@ BUILTIN_OBJS += builtin/describe.o
12421242
BUILTIN_OBJS += builtin/diagnose.o
12431243
BUILTIN_OBJS += builtin/diff-files.o
12441244
BUILTIN_OBJS += builtin/diff-index.o
1245+
BUILTIN_OBJS += builtin/diff-pairs.o
12451246
BUILTIN_OBJS += builtin/diff-tree.o
12461247
BUILTIN_OBJS += builtin/diff.o
12471248
BUILTIN_OBJS += builtin/difftool.o

builtin.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix, struct reposit
153153
int cmd_diff_files(int argc, const char **argv, const char *prefix, struct repository *repo);
154154
int cmd_diff_index(int argc, const char **argv, const char *prefix, struct repository *repo);
155155
int cmd_diff(int argc, const char **argv, const char *prefix, struct repository *repo);
156+
int cmd_diff_pairs(int argc, const char **argv, const char *prefix, struct repository *repo);
156157
int cmd_diff_tree(int argc, const char **argv, const char *prefix, struct repository *repo);
157158
int cmd_difftool(int argc, const char **argv, const char *prefix, struct repository *repo);
158159
int cmd_env__helper(int argc, const char **argv, const char *prefix, struct repository *repo);

builtin/diff-pairs.c

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
#include "builtin.h"
2+
#include "config.h"
3+
#include "diff.h"
4+
#include "diffcore.h"
5+
#include "gettext.h"
6+
#include "hash.h"
7+
#include "hex.h"
8+
#include "object.h"
9+
#include "parse-options.h"
10+
#include "revision.h"
11+
#include "strbuf.h"
12+
13+
static unsigned parse_mode_or_die(const char *mode, const char **end)
14+
{
15+
uint16_t ret;
16+
17+
*end = parse_mode(mode, &ret);
18+
if (!*end)
19+
die(_("unable to parse mode: %s"), mode);
20+
return ret;
21+
}
22+
23+
static void parse_oid_or_die(const char *hex, struct object_id *oid,
24+
const char **end, const struct git_hash_algo *algop)
25+
{
26+
if (parse_oid_hex_algop(hex, oid, end, algop) || *(*end)++ != ' ')
27+
die(_("unable to parse object id: %s"), hex);
28+
}
29+
30+
int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
31+
struct repository *repo)
32+
{
33+
struct strbuf path_dst = STRBUF_INIT;
34+
struct strbuf path = STRBUF_INIT;
35+
struct strbuf meta = STRBUF_INIT;
36+
struct option *parseopts;
37+
struct rev_info revs;
38+
int line_term = '\0';
39+
int ret;
40+
41+
const char * const builtin_diff_pairs_usage[] = {
42+
N_("git diff-pairs -z [<diff-options>]"),
43+
NULL
44+
};
45+
struct option builtin_diff_pairs_options[] = {
46+
OPT_END()
47+
};
48+
49+
repo_init_revisions(repo, &revs, prefix);
50+
51+
/*
52+
* Diff options are usually parsed implicitly as part of
53+
* setup_revisions(). Explicitly handle parsing to ensure options are
54+
* printed in the usage message.
55+
*/
56+
parseopts = add_diff_options(builtin_diff_pairs_options, &revs.diffopt);
57+
show_usage_with_options_if_asked(argc, argv, builtin_diff_pairs_usage, parseopts);
58+
59+
repo_config(repo, git_diff_basic_config, NULL);
60+
revs.disable_stdin = 1;
61+
revs.abbrev = 0;
62+
revs.diff = 1;
63+
64+
argc = parse_options(argc, argv, prefix, parseopts, builtin_diff_pairs_usage,
65+
PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_DASHDASH);
66+
67+
if (setup_revisions(argc, argv, &revs, NULL) > 1)
68+
usagef(_("unrecognized argument: %s"), argv[0]);
69+
70+
/*
71+
* With the -z option, both command input and raw output are
72+
* NUL-delimited (this mode does not affect patch output). At present
73+
* only NUL-delimited raw diff formatted input is supported.
74+
*/
75+
if (revs.diffopt.line_termination)
76+
usage(_("working without -z is not supported"));
77+
78+
if (revs.prune_data.nr)
79+
usage(_("pathspec arguments not supported"));
80+
81+
if (revs.pending.nr || revs.max_count != -1 ||
82+
revs.min_age != (timestamp_t)-1 ||
83+
revs.max_age != (timestamp_t)-1)
84+
usage(_("revision arguments not allowed"));
85+
86+
if (!revs.diffopt.output_format)
87+
revs.diffopt.output_format = DIFF_FORMAT_PATCH;
88+
89+
/*
90+
* If rename detection is not requested, use rename information from the
91+
* raw diff formatted input. Setting skip_resolving_statuses ensures
92+
* diffcore_std() does not mess with rename information already present
93+
* in queued filepairs.
94+
*/
95+
if (!revs.diffopt.detect_rename)
96+
revs.diffopt.skip_resolving_statuses = 1;
97+
98+
while (1) {
99+
struct object_id oid_a, oid_b;
100+
struct diff_filepair *pair;
101+
unsigned mode_a, mode_b;
102+
const char *p;
103+
char status;
104+
105+
if (strbuf_getwholeline(&meta, stdin, line_term) == EOF)
106+
break;
107+
108+
p = meta.buf;
109+
if (*p != ':')
110+
die(_("invalid raw diff input"));
111+
p++;
112+
113+
mode_a = parse_mode_or_die(p, &p);
114+
mode_b = parse_mode_or_die(p, &p);
115+
116+
if (S_ISDIR(mode_a) || S_ISDIR(mode_b))
117+
die(_("tree objects not supported"));
118+
119+
parse_oid_or_die(p, &oid_a, &p, repo->hash_algo);
120+
parse_oid_or_die(p, &oid_b, &p, repo->hash_algo);
121+
122+
status = *p++;
123+
124+
if (strbuf_getwholeline(&path, stdin, line_term) == EOF)
125+
die(_("got EOF while reading path"));
126+
127+
switch (status) {
128+
case DIFF_STATUS_ADDED:
129+
pair = diff_queue_addremove(&diff_queued_diff,
130+
&revs.diffopt, '+', mode_b,
131+
&oid_b, 1, path.buf, 0);
132+
if (pair)
133+
pair->status = status;
134+
break;
135+
136+
case DIFF_STATUS_DELETED:
137+
pair = diff_queue_addremove(&diff_queued_diff,
138+
&revs.diffopt, '-', mode_a,
139+
&oid_a, 1, path.buf, 0);
140+
if (pair)
141+
pair->status = status;
142+
break;
143+
144+
case DIFF_STATUS_TYPE_CHANGED:
145+
case DIFF_STATUS_MODIFIED:
146+
pair = diff_queue_change(&diff_queued_diff, &revs.diffopt,
147+
mode_a, mode_b, &oid_a, &oid_b,
148+
1, 1, path.buf, 0, 0);
149+
if (pair)
150+
pair->status = status;
151+
break;
152+
153+
case DIFF_STATUS_RENAMED:
154+
case DIFF_STATUS_COPIED: {
155+
struct diff_filespec *a, *b;
156+
unsigned int score;
157+
158+
if (strbuf_getwholeline(&path_dst, stdin, line_term) == EOF)
159+
die(_("got EOF while reading destination path"));
160+
161+
a = alloc_filespec(path.buf);
162+
b = alloc_filespec(path_dst.buf);
163+
fill_filespec(a, &oid_a, 1, mode_a);
164+
fill_filespec(b, &oid_b, 1, mode_b);
165+
166+
pair = diff_queue(&diff_queued_diff, a, b);
167+
168+
if (strtoul_ui(p, 10, &score))
169+
die(_("unable to parse rename/copy score: %s"), p);
170+
171+
pair->score = score * MAX_SCORE / 100;
172+
pair->status = status;
173+
pair->renamed_pair = 1;
174+
}
175+
break;
176+
177+
default:
178+
die(_("unknown diff status: %c"), status);
179+
}
180+
}
181+
182+
diffcore_std(&revs.diffopt);
183+
diff_flush(&revs.diffopt);
184+
ret = diff_result_code(&revs);
185+
186+
strbuf_release(&path_dst);
187+
strbuf_release(&path);
188+
strbuf_release(&meta);
189+
release_revisions(&revs);
190+
FREE_AND_NULL(parseopts);
191+
192+
return ret;
193+
}

command-list.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ git-diagnose ancillaryinterrogators
9696
git-diff mainporcelain info
9797
git-diff-files plumbinginterrogators
9898
git-diff-index plumbinginterrogators
99+
git-diff-pairs plumbinginterrogators
99100
git-diff-tree plumbinginterrogators
100101
git-difftool ancillaryinterrogators complete
101102
git-fast-export ancillarymanipulators

git.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -541,6 +541,7 @@ static struct cmd_struct commands[] = {
541541
{ "diff", cmd_diff, NO_PARSEOPT },
542542
{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
543543
{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
544+
{ "diff-pairs", cmd_diff_pairs, RUN_SETUP | NO_PARSEOPT },
544545
{ "diff-tree", cmd_diff_tree, RUN_SETUP | NO_PARSEOPT },
545546
{ "difftool", cmd_difftool, RUN_SETUP_GENTLY },
546547
{ "fast-export", cmd_fast_export, RUN_SETUP },

meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -540,6 +540,7 @@ builtin_sources = [
540540
'builtin/diagnose.c',
541541
'builtin/diff-files.c',
542542
'builtin/diff-index.c',
543+
'builtin/diff-pairs.c',
543544
'builtin/diff-tree.c',
544545
'builtin/diff.c',
545546
'builtin/difftool.c',

t/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -500,6 +500,7 @@ integration_tests = [
500500
't4067-diff-partial-clone.sh',
501501
't4068-diff-symmetric-merge-base.sh',
502502
't4069-remerge-diff.sh',
503+
't4070-diff-pairs.sh',
503504
't4100-apply-stat.sh',
504505
't4101-apply-nonl.sh',
505506
't4102-apply-rename.sh',

0 commit comments

Comments
 (0)