Skip to content

Commit f79e188

Browse files
pks-tTo1ne
authored andcommitted
cat-file: add option '-Z' that delimits input and output with NUL
In db9d67f (builtin/cat-file.c: support NUL-delimited input with `-z`, 2022-07-22), we have introduced a new mode to read the input via NUL-delimited records instead of newline-delimited records. This allows the user to query for revisions that have newlines in their path component. While unusual, such queries are perfectly valid and thus it is clear that we should be able to support them properly. Unfortunately, the commit only changed the input to be NUL-delimited, but didn't change the output at the same time. While this is fine for queries that are processed successfully, it is less so for queries that aren't. In the case of missing commits for example the result can become entirely unparsable: ``` $ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" | git cat-file --batch -z 7ce4f05 blob 10 1234567890 commit missing ``` This is of course a crafted query that is intentionally gaming the deficiency, but more benign queries that contain newlines would have similar problems. Ideally, we should have also changed the output to be NUL-delimited when `-z` is specified to avoid this problem. As the input is NUL-delimited, it is clear that the output in this case cannot ever contain NUL characters by itself. Furthermore, Git does not allow NUL characters in revisions anyway, further stressing the point that using NUL-delimited output is safe. The only exception is of course the object data itself, but as git-cat-file(1) prints the size of the object data clients should read until that specified size has been consumed. But even though `-z` has only been introduced a few releases ago in Git v2.38.0, changing the output format retroactively to also NUL-delimit output would be a backwards incompatible change. And while one could make the argument that the output is inherently broken already, we need to assume that there are existing users out there that use it just fine given that revisions containing newlines are quite exotic. Instead, introduce a new option `-Z` that switches to NUL-delimited input and output. While this new option could arguably only switch the output format to be NUL-delimited, the consequence would be that users have to always specify both `-z` and `-Z` when the input may contain newlines. On the other hand, if the user knows that there never will be newlines in the input, they don't have to use either of those options. There is thus no usecase that would warrant treating input and output format separately, which is why we instead opt to "do the right thing" and have `-Z` mean to NUL-terminate both formats. The old `-z` option is marked as deprecated with a hint that its output may become unparsable. It is thus hidden both from the synopsis as well as the command's help output. Co-authored-by: Toon Claes <[email protected]> Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 3217f52 commit f79e188

File tree

3 files changed

+139
-56
lines changed

3 files changed

+139
-56
lines changed

Documentation/git-cat-file.txt

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ SYNOPSIS
1414
'git cat-file' (-t | -s) [--allow-unknown-type] <object>
1515
'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects]
1616
[--buffer] [--follow-symlinks] [--unordered]
17-
[--textconv | --filters] [-z]
17+
[--textconv | --filters] [-Z]
1818
'git cat-file' (--textconv | --filters)
1919
[<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
2020

@@ -243,10 +243,16 @@ respectively print:
243243
/etc/passwd
244244
--
245245

246+
-Z::
247+
Only meaningful with `--batch`, `--batch-check`, or
248+
`--batch-command`; input and output is NUL-delimited instead of
249+
newline-delimited.
250+
246251
-z::
247252
Only meaningful with `--batch`, `--batch-check`, or
248253
`--batch-command`; input is NUL-delimited instead of
249-
newline-delimited.
254+
newline-delimited. This option is deprecated in favor of
255+
`-Z` as the output can otherwise be ambiguous.
250256

251257

252258
OUTPUT
@@ -384,6 +390,11 @@ notdir SP <size> LF
384390
is printed when, during symlink resolution, a file is used as a
385391
directory name.
386392

393+
Alternatively, when `-Z` is passed, the line feeds in any of the above examples
394+
are replaced with NUL terminators. This ensures that output will be parsable if
395+
the output itself would contain a linefeed and is thus recommended for
396+
scripting purposes.
397+
387398
CAVEATS
388399
-------
389400

builtin/cat-file.c

Lines changed: 36 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ struct batch_options {
4343
int unordered;
4444
int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
4545
char input_delim;
46+
char output_delim;
4647
const char *format;
4748
};
4849

@@ -437,11 +438,12 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
437438
}
438439
}
439440

440-
static void print_default_format(struct strbuf *scratch, struct expand_data *data)
441+
static void print_default_format(struct strbuf *scratch, struct expand_data *data,
442+
struct batch_options *opt)
441443
{
442-
strbuf_addf(scratch, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
444+
strbuf_addf(scratch, "%s %s %"PRIuMAX"%c", oid_to_hex(&data->oid),
443445
type_name(data->type),
444-
(uintmax_t)data->size);
446+
(uintmax_t)data->size, opt->output_delim);
445447
}
446448

447449
/*
@@ -470,8 +472,8 @@ static void batch_object_write(const char *obj_name,
470472
&data->oid, &data->info,
471473
OBJECT_INFO_LOOKUP_REPLACE);
472474
if (ret < 0) {
473-
printf("%s missing\n",
474-
obj_name ? obj_name : oid_to_hex(&data->oid));
475+
printf("%s missing%c",
476+
obj_name ? obj_name : oid_to_hex(&data->oid), opt->output_delim);
475477
fflush(stdout);
476478
return;
477479
}
@@ -492,17 +494,17 @@ static void batch_object_write(const char *obj_name,
492494
strbuf_reset(scratch);
493495

494496
if (!opt->format) {
495-
print_default_format(scratch, data);
497+
print_default_format(scratch, data, opt);
496498
} else {
497499
strbuf_expand(scratch, opt->format, expand_format, data);
498-
strbuf_addch(scratch, '\n');
500+
strbuf_addch(scratch, opt->output_delim);
499501
}
500502

501503
batch_write(opt, scratch->buf, scratch->len);
502504

503505
if (opt->batch_mode == BATCH_MODE_CONTENTS) {
504506
print_object_or_die(opt, data);
505-
batch_write(opt, "\n", 1);
507+
batch_write(opt, &opt->output_delim, 1);
506508
}
507509
}
508510

@@ -520,22 +522,25 @@ static void batch_one_object(const char *obj_name,
520522
if (result != FOUND) {
521523
switch (result) {
522524
case MISSING_OBJECT:
523-
printf("%s missing\n", obj_name);
525+
printf("%s missing%c", obj_name, opt->output_delim);
524526
break;
525527
case SHORT_NAME_AMBIGUOUS:
526-
printf("%s ambiguous\n", obj_name);
528+
printf("%s ambiguous%c", obj_name, opt->output_delim);
527529
break;
528530
case DANGLING_SYMLINK:
529-
printf("dangling %"PRIuMAX"\n%s\n",
530-
(uintmax_t)strlen(obj_name), obj_name);
531+
printf("dangling %"PRIuMAX"%c%s%c",
532+
(uintmax_t)strlen(obj_name),
533+
opt->output_delim, obj_name, opt->output_delim);
531534
break;
532535
case SYMLINK_LOOP:
533-
printf("loop %"PRIuMAX"\n%s\n",
534-
(uintmax_t)strlen(obj_name), obj_name);
536+
printf("loop %"PRIuMAX"%c%s%c",
537+
(uintmax_t)strlen(obj_name),
538+
opt->output_delim, obj_name, opt->output_delim);
535539
break;
536540
case NOT_DIR:
537-
printf("notdir %"PRIuMAX"\n%s\n",
538-
(uintmax_t)strlen(obj_name), obj_name);
541+
printf("notdir %"PRIuMAX"%c%s%c",
542+
(uintmax_t)strlen(obj_name),
543+
opt->output_delim, obj_name, opt->output_delim);
539544
break;
540545
default:
541546
BUG("unknown get_sha1_with_context result %d\n",
@@ -547,9 +552,9 @@ static void batch_one_object(const char *obj_name,
547552
}
548553

549554
if (ctx.mode == 0) {
550-
printf("symlink %"PRIuMAX"\n%s\n",
555+
printf("symlink %"PRIuMAX"%c%s%c",
551556
(uintmax_t)ctx.symlink_path.len,
552-
ctx.symlink_path.buf);
557+
opt->output_delim, ctx.symlink_path.buf, opt->output_delim);
553558
fflush(stdout);
554559
return;
555560
}
@@ -913,14 +918,15 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
913918
struct batch_options batch = {0};
914919
int unknown_type = 0;
915920
int input_nul_terminated = 0;
921+
int nul_terminated = 0;
916922

917923
const char * const usage[] = {
918924
N_("git cat-file <type> <object>"),
919925
N_("git cat-file (-e | -p) <object>"),
920926
N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
921927
N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
922928
" [--buffer] [--follow-symlinks] [--unordered]\n"
923-
" [--textconv | --filters] [-z]"),
929+
" [--textconv | --filters] [-Z]"),
924930
N_("git cat-file (--textconv | --filters)\n"
925931
" [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
926932
NULL
@@ -949,7 +955,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
949955
N_("like --batch, but don't emit <contents>"),
950956
PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
951957
batch_option_callback),
952-
OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
958+
OPT_BOOL_F('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated"),
959+
PARSE_OPT_HIDDEN),
960+
OPT_BOOL('Z', NULL, &nul_terminated, N_("stdin and stdout is NUL-terminated")),
953961
OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
954962
N_("read commands from stdin"),
955963
PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
@@ -1011,8 +1019,15 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
10111019
else if (input_nul_terminated)
10121020
usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
10131021
"-z");
1022+
else if (nul_terminated)
1023+
usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
1024+
"-Z");
10141025

1015-
batch.input_delim = input_nul_terminated ? '\0' : '\n';
1026+
batch.input_delim = batch.output_delim = '\n';
1027+
if (input_nul_terminated)
1028+
batch.input_delim = '\0';
1029+
if (nul_terminated)
1030+
batch.input_delim = batch.output_delim = '\0';
10161031

10171032
/* Batch defaults */
10181033
if (batch.buffer_output < 0)

0 commit comments

Comments
 (0)