Skip to content

Commit dbaa6bd

Browse files
committed
Merge branch 'ls/filter-process'
The smudge/clean filter API expect an external process is spawned to filter the contents for each path that has a filter defined. A new type of "process" filter API has been added to allow the first request to run the filter for a path to spawn a single process, and all filtering need is served by this single process for multiple paths, reducing the process creation overhead. * ls/filter-process: contrib/long-running-filter: add long running filter example convert: add filter.<driver>.process option convert: prepare filter.<driver>.process option convert: make apply_filter() adhere to standard Git error handling pkt-line: add functions to read/write flush terminated packet streams pkt-line: add packet_write_gently() pkt-line: add packet_flush_gently() pkt-line: add packet_write_fmt_gently() pkt-line: extract set_packet_header() pkt-line: rename packet_write() to packet_write_fmt() run-command: add clean_on_exit_handler run-command: move check_pipe() from write_or_die to run_command convert: modernize tests convert: quote filter names in error messages
2 parents 906d690 + 0f71fa2 commit dbaa6bd

19 files changed

+1498
-135
lines changed

Documentation/gitattributes.txt

Lines changed: 156 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,15 @@ checkout, when the `smudge` command is specified, the command is
293293
fed the blob object from its standard input, and its standard
294294
output is used to update the worktree file. Similarly, the
295295
`clean` command is used to convert the contents of worktree file
296-
upon checkin.
296+
upon checkin. By default these commands process only a single
297+
blob and terminate. If a long running `process` filter is used
298+
in place of `clean` and/or `smudge` filters, then Git can process
299+
all blobs with a single filter command invocation for the entire
300+
life of a single Git command, for example `git add --all`. If a
301+
long running `process` filter is configured then it always takes
302+
precedence over a configured single blob filter. See section
303+
below for the description of the protocol used to communicate with
304+
a `process` filter.
297305

298306
One use of the content filtering is to massage the content into a shape
299307
that is more convenient for the platform, filesystem, and the user to use.
@@ -373,6 +381,153 @@ not exist, or may have different contents. So, smudge and clean commands
373381
should not try to access the file on disk, but only act as filters on the
374382
content provided to them on standard input.
375383

384+
Long Running Filter Process
385+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
386+
387+
If the filter command (a string value) is defined via
388+
`filter.<driver>.process` then Git can process all blobs with a
389+
single filter invocation for the entire life of a single Git
390+
command. This is achieved by using a packet format (pkt-line,
391+
see technical/protocol-common.txt) based protocol over standard
392+
input and standard output as follows. All packets, except for the
393+
"*CONTENT" packets and the "0000" flush packet, are considered
394+
text and therefore are terminated by a LF.
395+
396+
Git starts the filter when it encounters the first file
397+
that needs to be cleaned or smudged. After the filter started
398+
Git sends a welcome message ("git-filter-client"), a list of supported
399+
protocol version numbers, and a flush packet. Git expects to read a welcome
400+
response message ("git-filter-server"), exactly one protocol version number
401+
from the previously sent list, and a flush packet. All further
402+
communication will be based on the selected version. The remaining
403+
protocol description below documents "version=2". Please note that
404+
"version=42" in the example below does not exist and is only there
405+
to illustrate how the protocol would look like with more than one
406+
version.
407+
408+
After the version negotiation Git sends a list of all capabilities that
409+
it supports and a flush packet. Git expects to read a list of desired
410+
capabilities, which must be a subset of the supported capabilities list,
411+
and a flush packet as response:
412+
------------------------
413+
packet: git> git-filter-client
414+
packet: git> version=2
415+
packet: git> version=42
416+
packet: git> 0000
417+
packet: git< git-filter-server
418+
packet: git< version=2
419+
packet: git< 0000
420+
packet: git> capability=clean
421+
packet: git> capability=smudge
422+
packet: git> capability=not-yet-invented
423+
packet: git> 0000
424+
packet: git< capability=clean
425+
packet: git< capability=smudge
426+
packet: git< 0000
427+
------------------------
428+
Supported filter capabilities in version 2 are "clean" and
429+
"smudge".
430+
431+
Afterwards Git sends a list of "key=value" pairs terminated with
432+
a flush packet. The list will contain at least the filter command
433+
(based on the supported capabilities) and the pathname of the file
434+
to filter relative to the repository root. Right after the flush packet
435+
Git sends the content split in zero or more pkt-line packets and a
436+
flush packet to terminate content. Please note, that the filter
437+
must not send any response before it received the content and the
438+
final flush packet.
439+
------------------------
440+
packet: git> command=smudge
441+
packet: git> pathname=path/testfile.dat
442+
packet: git> 0000
443+
packet: git> CONTENT
444+
packet: git> 0000
445+
------------------------
446+
447+
The filter is expected to respond with a list of "key=value" pairs
448+
terminated with a flush packet. If the filter does not experience
449+
problems then the list must contain a "success" status. Right after
450+
these packets the filter is expected to send the content in zero
451+
or more pkt-line packets and a flush packet at the end. Finally, a
452+
second list of "key=value" pairs terminated with a flush packet
453+
is expected. The filter can change the status in the second list
454+
or keep the status as is with an empty list. Please note that the
455+
empty list must be terminated with a flush packet regardless.
456+
457+
------------------------
458+
packet: git< status=success
459+
packet: git< 0000
460+
packet: git< SMUDGED_CONTENT
461+
packet: git< 0000
462+
packet: git< 0000 # empty list, keep "status=success" unchanged!
463+
------------------------
464+
465+
If the result content is empty then the filter is expected to respond
466+
with a "success" status and a flush packet to signal the empty content.
467+
------------------------
468+
packet: git< status=success
469+
packet: git< 0000
470+
packet: git< 0000 # empty content!
471+
packet: git< 0000 # empty list, keep "status=success" unchanged!
472+
------------------------
473+
474+
In case the filter cannot or does not want to process the content,
475+
it is expected to respond with an "error" status.
476+
------------------------
477+
packet: git< status=error
478+
packet: git< 0000
479+
------------------------
480+
481+
If the filter experiences an error during processing, then it can
482+
send the status "error" after the content was (partially or
483+
completely) sent.
484+
------------------------
485+
packet: git< status=success
486+
packet: git< 0000
487+
packet: git< HALF_WRITTEN_ERRONEOUS_CONTENT
488+
packet: git< 0000
489+
packet: git< status=error
490+
packet: git< 0000
491+
------------------------
492+
493+
In case the filter cannot or does not want to process the content
494+
as well as any future content for the lifetime of the Git process,
495+
then it is expected to respond with an "abort" status at any point
496+
in the protocol.
497+
------------------------
498+
packet: git< status=abort
499+
packet: git< 0000
500+
------------------------
501+
502+
Git neither stops nor restarts the filter process in case the
503+
"error"/"abort" status is set. However, Git sets its exit code
504+
according to the `filter.<driver>.required` flag, mimicking the
505+
behavior of the `filter.<driver>.clean` / `filter.<driver>.smudge`
506+
mechanism.
507+
508+
If the filter dies during the communication or does not adhere to
509+
the protocol then Git will stop the filter process and restart it
510+
with the next file that needs to be processed. Depending on the
511+
`filter.<driver>.required` flag Git will interpret that as error.
512+
513+
After the filter has processed a blob it is expected to wait for
514+
the next "key=value" list containing a command. Git will close
515+
the command pipe on exit. The filter is expected to detect EOF
516+
and exit gracefully on its own. Git will wait until the filter
517+
process has stopped.
518+
519+
A long running filter demo implementation can be found in
520+
`contrib/long-running-filter/example.pl` located in the Git
521+
core repository. If you develop your own long running filter
522+
process then the `GIT_TRACE_PACKET` environment variables can be
523+
very helpful for debugging (see linkgit:git[1]).
524+
525+
Please note that you cannot use an existing `filter.<driver>.clean`
526+
or `filter.<driver>.smudge` command with `filter.<driver>.process`
527+
because the former two use a different inter process communication
528+
protocol than the latter one.
529+
530+
376531
Interaction between checkin/checkout attributes
377532
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
378533

builtin/archive.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,10 @@ static int run_remote_archiver(int argc, const char **argv,
4747
if (name_hint) {
4848
const char *format = archive_format_from_filename(name_hint);
4949
if (format)
50-
packet_write(fd[1], "argument --format=%s\n", format);
50+
packet_write_fmt(fd[1], "argument --format=%s\n", format);
5151
}
5252
for (i = 1; i < argc; i++)
53-
packet_write(fd[1], "argument %s\n", argv[i]);
53+
packet_write_fmt(fd[1], "argument %s\n", argv[i]);
5454
packet_flush(fd[1]);
5555

5656
buf = packet_read_line(fd[0], NULL);

builtin/receive-pack.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
227227
static void show_ref(const char *path, const unsigned char *sha1)
228228
{
229229
if (sent_capabilities) {
230-
packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
230+
packet_write_fmt(1, "%s %s\n", sha1_to_hex(sha1), path);
231231
} else {
232232
struct strbuf cap = STRBUF_INIT;
233233

@@ -242,7 +242,7 @@ static void show_ref(const char *path, const unsigned char *sha1)
242242
if (advertise_push_options)
243243
strbuf_addstr(&cap, " push-options");
244244
strbuf_addf(&cap, " agent=%s", git_user_agent_sanitized());
245-
packet_write(1, "%s %s%c%s\n",
245+
packet_write_fmt(1, "%s %s%c%s\n",
246246
sha1_to_hex(sha1), path, 0, cap.buf);
247247
strbuf_release(&cap);
248248
sent_capabilities = 1;

builtin/remote-ext.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,9 +128,9 @@ static void send_git_request(int stdin_fd, const char *serv, const char *repo,
128128
const char *vhost)
129129
{
130130
if (!vhost)
131-
packet_write(stdin_fd, "%s %s%c", serv, repo, 0);
131+
packet_write_fmt(stdin_fd, "%s %s%c", serv, repo, 0);
132132
else
133-
packet_write(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
133+
packet_write_fmt(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
134134
vhost, 0);
135135
}
136136

builtin/upload-archive.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,11 +88,11 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
8888
writer.git_cmd = 1;
8989
if (start_command(&writer)) {
9090
int err = errno;
91-
packet_write(1, "NACK unable to spawn subprocess\n");
91+
packet_write_fmt(1, "NACK unable to spawn subprocess\n");
9292
die("upload-archive: %s", strerror(err));
9393
}
9494

95-
packet_write(1, "ACK\n");
95+
packet_write_fmt(1, "ACK\n");
9696
packet_flush(1);
9797

9898
while (1) {

connect.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -750,7 +750,7 @@ struct child_process *git_connect(int fd[2], const char *url,
750750
* Note: Do not add any other headers here! Doing so
751751
* will cause older git-daemon servers to crash.
752752
*/
753-
packet_write(fd[1],
753+
packet_write_fmt(fd[1],
754754
"%s %s%chost=%s%c",
755755
prog, path, 0,
756756
target_host, 0);
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
#!/usr/bin/perl
2+
#
3+
# Example implementation for the Git filter protocol version 2
4+
# See Documentation/gitattributes.txt, section "Filter Protocol"
5+
#
6+
# Please note, this pass-thru filter is a minimal skeleton. No proper
7+
# error handling was implemented.
8+
#
9+
10+
use strict;
11+
use warnings;
12+
13+
my $MAX_PACKET_CONTENT_SIZE = 65516;
14+
15+
sub packet_bin_read {
16+
my $buffer;
17+
my $bytes_read = read STDIN, $buffer, 4;
18+
if ( $bytes_read == 0 ) {
19+
20+
# EOF - Git stopped talking to us!
21+
exit();
22+
}
23+
elsif ( $bytes_read != 4 ) {
24+
die "invalid packet: '$buffer'";
25+
}
26+
my $pkt_size = hex($buffer);
27+
if ( $pkt_size == 0 ) {
28+
return ( 1, "" );
29+
}
30+
elsif ( $pkt_size > 4 ) {
31+
my $content_size = $pkt_size - 4;
32+
$bytes_read = read STDIN, $buffer, $content_size;
33+
if ( $bytes_read != $content_size ) {
34+
die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
35+
}
36+
return ( 0, $buffer );
37+
}
38+
else {
39+
die "invalid packet size: $pkt_size";
40+
}
41+
}
42+
43+
sub packet_txt_read {
44+
my ( $res, $buf ) = packet_bin_read();
45+
unless ( $buf =~ s/\n$// ) {
46+
die "A non-binary line MUST be terminated by an LF.";
47+
}
48+
return ( $res, $buf );
49+
}
50+
51+
sub packet_bin_write {
52+
my $buf = shift;
53+
print STDOUT sprintf( "%04x", length($buf) + 4 );
54+
print STDOUT $buf;
55+
STDOUT->flush();
56+
}
57+
58+
sub packet_txt_write {
59+
packet_bin_write( $_[0] . "\n" );
60+
}
61+
62+
sub packet_flush {
63+
print STDOUT sprintf( "%04x", 0 );
64+
STDOUT->flush();
65+
}
66+
67+
( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize";
68+
( packet_txt_read() eq ( 0, "version=2" ) ) || die "bad version";
69+
( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
70+
71+
packet_txt_write("git-filter-server");
72+
packet_txt_write("version=2");
73+
packet_flush();
74+
75+
( packet_txt_read() eq ( 0, "capability=clean" ) ) || die "bad capability";
76+
( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
77+
( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
78+
79+
packet_txt_write("capability=clean");
80+
packet_txt_write("capability=smudge");
81+
packet_flush();
82+
83+
while (1) {
84+
my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
85+
my ($pathname) = packet_txt_read() =~ /^pathname=([^=]+)$/;
86+
87+
packet_bin_read();
88+
89+
my $input = "";
90+
{
91+
binmode(STDIN);
92+
my $buffer;
93+
my $done = 0;
94+
while ( !$done ) {
95+
( $done, $buffer ) = packet_bin_read();
96+
$input .= $buffer;
97+
}
98+
}
99+
100+
my $output;
101+
if ( $command eq "clean" ) {
102+
### Perform clean here ###
103+
$output = $input;
104+
}
105+
elsif ( $command eq "smudge" ) {
106+
### Perform smudge here ###
107+
$output = $input;
108+
}
109+
else {
110+
die "bad command '$command'";
111+
}
112+
113+
packet_txt_write("status=success");
114+
packet_flush();
115+
while ( length($output) > 0 ) {
116+
my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
117+
packet_bin_write($packet);
118+
if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
119+
$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
120+
}
121+
else {
122+
$output = "";
123+
}
124+
}
125+
packet_flush(); # flush content!
126+
packet_flush(); # empty list, keep "status=success" unchanged!
127+
128+
}

0 commit comments

Comments
 (0)