Skip to content

Commit 05a421c

Browse files
committed
config-batch: add NUL-terminated I/O format
When using automated tools, it is critical to allow for input/output formats that include special characters such as spaces and newlines. While the existing protocol for 'git config-batch' is human-readable and has some capacity for some spaces in certain positions, it is not available for spaces in the config key or newlines in the config values. Add the '-z' option to signal the use of NUL-terminated strings. To understand where commands end regardless of potential future formats, use two NUL bytes in a row to terminate a command. To allow for empty string values, each token is provided in a <length>:<value> format, making "0:" the empty string value. Update the existing 'help' and 'get' commands to match this format. Create helper methods that make it easy to parse and print in both formats simultaneously. Signed-off-by: Derrick Stolee <stolee@gmail.com>
1 parent cc9a034 commit 05a421c

File tree

3 files changed

+293
-21
lines changed

3 files changed

+293
-21
lines changed

Documentation/git-config-batch.adoc

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,15 @@ multiple configuration values, the `git config-batch` command allows a
2121
single process to handle multiple requests using a machine-parseable
2222
interface across `stdin` and `stdout`.
2323

24+
OPTIONS
25+
-------
26+
27+
`-z`::
28+
If specified, then use the NUL-terminated input and output
29+
format instead of the space and newline format. This format is
30+
useful when the strings involved may include spaces or newlines.
31+
See PROTOCOL for more details.
32+
2433
PROTOCOL
2534
--------
2635
By default, the protocol uses line feeds (`LF`) to signal the end of a
@@ -41,13 +50,13 @@ These are the commands that are currently understood:
4150
`help` version 1::
4251
The `help` command lists the currently-available commands in
4352
this version of Git. The output is multi-line, but the first
44-
line provides the count of possible commands via `help count <N>`.
45-
The next `<N>` lines are of the form `help <command> <version>`
53+
line provides the count of possible commands via `help 1 count <N>`.
54+
The next `<N>` lines are of the form `help 1 <command> <version>`
4655
to state that this Git version supports that `<command>` at
4756
version `<version>`. Note that the same command may have multiple
4857
available versions.
4958
+
50-
Here is the currentl output of the help text at the latest version:
59+
Here is the current output of the help text at the latest version:
5160
+
5261
------------
5362
help 1 count 2
@@ -102,6 +111,48 @@ get 1 missing <key> [<value-pattern>|<value>]
102111
where `<value-pattern>` or `<value>` is only supplied if provided in
103112
the command.
104113

114+
NUL-Terminated Format
115+
~~~~~~~~~~~~~~~~~~~~~
116+
117+
When `-z` is given, the protocol changes in some structural ways.
118+
119+
First, each command is terminated with two NUL bytes, providing a clear
120+
boundary between commands regardless of future possibilities of new
121+
command formats.
122+
123+
Second, any time that a space _would_ be used to partition tokens in a
124+
command, a NUL byte is used instead. Further, each token is prefixed
125+
with `<N>:` where `<N>` is a decimal representation of the length of
126+
the string between the `:` and the next NUL byte. Any disagreement in
127+
these lengths is treated as a parsing error. This use of a length does
128+
imply that "`0:`" is the representation of an empty string, if relevant.
129+
130+
The decimal representation must have at most five numerals, thus the
131+
maximum length of a string token can have 99999 characters.
132+
133+
For example, the `get` command, version 1, could have any of the
134+
following forms:
135+
136+
------------
137+
3:get NUL 1:1 NUL 5:local NUL 14:key.with space NUL NUL
138+
3:get NUL 1:1 NUL 9:inherit NUL 8:test.key NUL 9:arg:regex NUL 6:.*\ .* NUL NUL
139+
3:get NUL 1:1 NUL 6:global NUL 8:test.key NUL 15:arg:fixed-value NUL 3:a b NUL NUL
140+
------------
141+
142+
The output is modified similarly, such as the following output examples,
143+
as if the input has a parse error, a valid `help` command, a `get`
144+
command that had a match, and a `get` command that did not match.
145+
146+
------------
147+
15:unknown_command NUL NUL
148+
4:help NUL 1:1 NUL 5:count NUL 1:2 NUL NUL
149+
4:help NUL 1:1 NUL 4:help NUL 1:1 NUL NUL
150+
4:help NUL 1:1 NUL 3:get NUL 1:1 NUL NUL
151+
3:get NUL 1:1 NUL 5:found NUL 8:test.key NUL 5:value NUL NUL
152+
3:get NUL 1:1 NUL 7:missing NUL 8:test.key NUL NUL
153+
------------
154+
155+
105156
SEE ALSO
106157
--------
107158
linkgit:git-config[1]

builtin/config-batch.c

Lines changed: 170 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -11,24 +11,40 @@ static const char *const builtin_config_batch_usage[] = {
1111
NULL
1212
};
1313

14+
static int zformat = 0;
15+
1416
#define UNKNOWN_COMMAND "unknown_command"
1517
#define HELP_COMMAND "help"
1618
#define GET_COMMAND "get"
1719
#define COMMAND_PARSE_ERROR "command_parse_error"
1820

21+
static void print_word(const char *word, int start)
22+
{
23+
if (zformat) {
24+
printf("%"PRIu32":%s", (uint32_t)strlen(word), word);
25+
fputc(0, stdout);
26+
} else if (start)
27+
printf("%s", word);
28+
else
29+
printf(" %s", word);
30+
}
31+
1932
static int emit_response(const char *response, ...)
2033
{
2134
va_list params;
2235
const char *token;
2336

24-
printf("%s", response);
37+
print_word(response, 1);
2538

2639
va_start(params, response);
2740
while ((token = va_arg(params, const char *)))
28-
printf(" %s", token);
41+
print_word(token, 0);
2942
va_end(params);
3043

31-
printf("\n");
44+
if (zformat)
45+
fputc(0, stdout);
46+
else
47+
printf("\n");
3248
fflush(stdout);
3349
return 0;
3450
}
@@ -59,6 +75,52 @@ static int unknown_command(struct repository *repo UNUSED,
5975
return emit_response(UNKNOWN_COMMAND, NULL);
6076
}
6177

78+
/*
79+
* Parse the next token using the NUL-byte format.
80+
*/
81+
static size_t parse_ztoken(char **data, size_t *data_len,
82+
char **token, int *err)
83+
{
84+
size_t i = 0, token_len;
85+
86+
while (i < *data_len && (*data)[i] != ':') {
87+
if ((*data)[i] < '0' || (*data)[i] > '9') {
88+
goto parse_error;
89+
}
90+
i++;
91+
}
92+
93+
if (i >= *data_len || (*data)[i] != ':' || i > 5)
94+
goto parse_error;
95+
96+
(*data)[i] = 0;
97+
token_len = atoi(*data);
98+
99+
if (token_len + i + 1 >= *data_len)
100+
goto parse_error;
101+
102+
*token = *data + i + 1;
103+
*data_len = *data_len - (i + 1);
104+
105+
/* check for early NULs. */
106+
for (i = 0; i < token_len; i++) {
107+
if (!(*token)[i])
108+
goto parse_error;
109+
}
110+
/* check for matching NUL. */
111+
if ((*token)[token_len])
112+
goto parse_error;
113+
114+
*data = *token + token_len + 1;
115+
*data_len = *data_len - (token_len + 1);
116+
return token_len;
117+
118+
parse_error:
119+
*err = 1;
120+
*token = NULL;
121+
return 0;
122+
}
123+
62124
static size_t parse_whitespace_token(char **data, size_t *data_len,
63125
char **token, int *err UNUSED)
64126
{
@@ -93,15 +155,23 @@ static size_t parse_whitespace_token(char **data, size_t *data_len,
93155
* The returned value is the length of the token that was
94156
* discovered.
95157
*
96-
* 'err' is ignored for now, but will be filled in in a future
97-
* change.
158+
* The 'token' pointer is used to set the start of the token.
159+
* In the whitespace format, this is always the input value of
160+
* 'data' but in the NUL-terminated format this follows an "<N>:"
161+
* prefix.
162+
*
163+
* In the case of the NUL-terminated format, a bad parse of the
164+
* decimal length or a mismatch of the decimal length and the
165+
* length of the following NUL-terminated string will result in
166+
* the value pointed at by 'err' to be set to 1.
98167
*/
99168
static size_t parse_token(char **data, size_t *data_len,
100169
char **token, int *err)
101170
{
102171
if (!*data_len)
103172
return 0;
104-
173+
if (zformat)
174+
return parse_ztoken(data, data_len, token, err);
105175
return parse_whitespace_token(data, data_len, token, err);
106176
}
107177

@@ -255,7 +325,13 @@ static int get_command_1(struct repository *repo,
255325
goto parse_error; /* unknown arg. */
256326

257327
/* Use the remaining data as the value string. */
258-
gc_data.value = data;
328+
if (!zformat)
329+
gc_data.value = data;
330+
else {
331+
parse_token(&data, &data_len, &gc_data.value, &err);
332+
if (err)
333+
goto parse_error;
334+
}
259335

260336
if (gc_data.mode == MATCH_REGEX) {
261337
CALLOC_ARRAY(gc_data.value_pattern, 1);
@@ -348,17 +424,74 @@ static int help_command_1(struct repository *repo UNUSED,
348424
return 0;
349425
}
350426

351-
/**
352-
* Process a single line from stdin and process the command.
353-
*
354-
* Returns 0 on successful processing of command, including the
355-
* unknown_command output.
356-
*
357-
* Returns 1 on natural exit due to exist signal of empty line.
358-
*
359-
* Returns negative value on other catastrophic error.
360-
*/
361-
static int process_command(struct repository *repo)
427+
static int process_command_nul(struct repository *repo)
428+
{
429+
static struct strbuf line = STRBUF_INIT;
430+
char *data, *command, *versionstr;
431+
size_t data_len, token_len;
432+
int res = 0, err = 0, version = 0, getc;
433+
char c;
434+
435+
/* If we start with EOF it's not an error. */
436+
getc = fgetc(stdin);
437+
if (getc == EOF)
438+
return 1;
439+
440+
do {
441+
c = (char)getc;
442+
strbuf_addch(&line, c);
443+
444+
if (!c && line.len > 1 && !line.buf[line.len - 2])
445+
break;
446+
447+
getc = fgetc(stdin);
448+
449+
/* It's an error if we reach EOF while parsing a command. */
450+
if (getc == EOF)
451+
goto parse_error;
452+
} while (1);
453+
454+
data = line.buf;
455+
data_len = line.len - 1;
456+
457+
token_len = parse_ztoken(&data, &data_len, &command, &err);
458+
if (!token_len || err)
459+
goto parse_error;
460+
461+
token_len = parse_ztoken(&data, &data_len, &versionstr, &err);
462+
if (!token_len || err)
463+
goto parse_error;
464+
465+
if (!git_parse_int(versionstr, &version)) {
466+
res = error(_("unable to parse '%s' to integer"),
467+
versionstr);
468+
goto parse_error;
469+
}
470+
471+
for (size_t i = 0; i < COMMAND_COUNT; i++) {
472+
/*
473+
* Run the ith command if we have hit the unknown
474+
* command or if the name and version match.
475+
*/
476+
if (!commands[i].name[0] ||
477+
(!strcmp(command, commands[i].name) &&
478+
commands[i].version == version)) {
479+
res = commands[i].fn(repo, data, data_len);
480+
goto cleanup;
481+
}
482+
}
483+
484+
BUG(_("scanned to end of command list, including 'unknown_command'"));
485+
486+
parse_error:
487+
res = unknown_command(repo, NULL, 0);
488+
489+
cleanup:
490+
strbuf_release(&line);
491+
return res;
492+
}
493+
494+
static int process_command_whitespace(struct repository *repo)
362495
{
363496
static struct strbuf line = STRBUF_INIT;
364497
struct string_list tokens = STRING_LIST_INIT_NODUP;
@@ -416,13 +549,32 @@ static int process_command(struct repository *repo)
416549
return res;
417550
}
418551

552+
/**
553+
* Process a single line from stdin and process the command.
554+
*
555+
* Returns 0 on successful processing of command, including the
556+
* unknown_command output.
557+
*
558+
* Returns 1 on natural exit due to exist signal of empty line.
559+
*
560+
* Returns negative value on other catastrophic error.
561+
*/
562+
static int process_command(struct repository *repo)
563+
{
564+
if (zformat)
565+
return process_command_nul(repo);
566+
return process_command_whitespace(repo);
567+
}
568+
419569
int cmd_config_batch(int argc,
420570
const char **argv,
421571
const char *prefix,
422572
struct repository *repo)
423573
{
424574
int res = 0;
425575
struct option options[] = {
576+
OPT_BOOL('z', NULL, &zformat,
577+
N_("stdin and stdout is NUL-terminated")),
426578
OPT_END(),
427579
};
428580

0 commit comments

Comments
 (0)