Skip to content

Commit 6bc0cb5

Browse files
peffgitster
authored andcommitted
http-backend: spool ref negotiation requests to buffer
When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <[email protected]> Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent cc969c8 commit 6bc0cb5

File tree

3 files changed

+105
-11
lines changed

3 files changed

+105
-11
lines changed

Documentation/git-http-backend.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,15 @@ The GIT_HTTP_EXPORT_ALL environmental variable may be passed to
255255
'git-http-backend' to bypass the check for the "git-daemon-export-ok"
256256
file in each repository before allowing export of that repository.
257257

258+
The `GIT_HTTP_MAX_REQUEST_BUFFER` environment variable (or the
259+
`http.maxRequestBuffer` config variable) may be set to change the
260+
largest ref negotiation request that git will handle during a fetch; any
261+
fetch requiring a larger buffer will not succeed. This value should not
262+
normally need to be changed, but may be helpful if you are fetching from
263+
a repository with an extremely large number of refs. The value can be
264+
specified with a unit (e.g., `100M` for 100 megabytes). The default is
265+
10 megabytes.
266+
258267
The backend process sets GIT_COMMITTER_NAME to '$REMOTE_USER' and
259268
GIT_COMMITTER_EMAIL to '$\{REMOTE_USER}@http.$\{REMOTE_ADDR\}',
260269
ensuring that any reflogs created by 'git-receive-pack' contain some

http-backend.c

Lines changed: 85 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,20 @@ static const char content_type[] = "Content-Type";
1313
static const char content_length[] = "Content-Length";
1414
static const char last_modified[] = "Last-Modified";
1515
static int getanyfile = 1;
16+
static unsigned long max_request_buffer = 10 * 1024 * 1024;
1617

1718
static struct string_list *query_params;
1819

1920
struct rpc_service {
2021
const char *name;
2122
const char *config_name;
23+
unsigned buffer_input : 1;
2224
signed enabled : 2;
2325
};
2426

2527
static struct rpc_service rpc_service[] = {
26-
{ "upload-pack", "uploadpack", 1 },
27-
{ "receive-pack", "receivepack", -1 },
28+
{ "upload-pack", "uploadpack", 1, 1 },
29+
{ "receive-pack", "receivepack", 0, -1 },
2830
};
2931

3032
static struct string_list *get_parameters(void)
@@ -225,6 +227,7 @@ static void http_config(void)
225227
struct strbuf var = STRBUF_INIT;
226228

227229
git_config_get_bool("http.getanyfile", &getanyfile);
230+
git_config_get_ulong("http.maxrequestbuffer", &max_request_buffer);
228231

229232
for (i = 0; i < ARRAY_SIZE(rpc_service); i++) {
230233
struct rpc_service *svc = &rpc_service[i];
@@ -266,9 +269,52 @@ static struct rpc_service *select_service(const char *name)
266269
return svc;
267270
}
268271

269-
static void inflate_request(const char *prog_name, int out)
272+
/*
273+
* This is basically strbuf_read(), except that if we
274+
* hit max_request_buffer we die (we'd rather reject a
275+
* maliciously large request than chew up infinite memory).
276+
*/
277+
static ssize_t read_request(int fd, unsigned char **out)
278+
{
279+
size_t len = 0, alloc = 8192;
280+
unsigned char *buf = xmalloc(alloc);
281+
282+
if (max_request_buffer < alloc)
283+
max_request_buffer = alloc;
284+
285+
while (1) {
286+
ssize_t cnt;
287+
288+
cnt = read_in_full(fd, buf + len, alloc - len);
289+
if (cnt < 0) {
290+
free(buf);
291+
return -1;
292+
}
293+
294+
/* partial read from read_in_full means we hit EOF */
295+
len += cnt;
296+
if (len < alloc) {
297+
*out = buf;
298+
return len;
299+
}
300+
301+
/* otherwise, grow and try again (if we can) */
302+
if (alloc == max_request_buffer)
303+
die("request was larger than our maximum size (%lu);"
304+
" try setting GIT_HTTP_MAX_REQUEST_BUFFER",
305+
max_request_buffer);
306+
307+
alloc = alloc_nr(alloc);
308+
if (alloc > max_request_buffer)
309+
alloc = max_request_buffer;
310+
REALLOC_ARRAY(buf, alloc);
311+
}
312+
}
313+
314+
static void inflate_request(const char *prog_name, int out, int buffer_input)
270315
{
271316
git_zstream stream;
317+
unsigned char *full_request = NULL;
272318
unsigned char in_buf[8192];
273319
unsigned char out_buf[8192];
274320
unsigned long cnt = 0;
@@ -277,11 +323,21 @@ static void inflate_request(const char *prog_name, int out)
277323
git_inflate_init_gzip_only(&stream);
278324

279325
while (1) {
280-
ssize_t n = xread(0, in_buf, sizeof(in_buf));
326+
ssize_t n;
327+
328+
if (buffer_input) {
329+
if (full_request)
330+
n = 0; /* nothing left to read */
331+
else
332+
n = read_request(0, &full_request);
333+
stream.next_in = full_request;
334+
} else {
335+
n = xread(0, in_buf, sizeof(in_buf));
336+
stream.next_in = in_buf;
337+
}
338+
281339
if (n <= 0)
282340
die("request ended in the middle of the gzip stream");
283-
284-
stream.next_in = in_buf;
285341
stream.avail_in = n;
286342

287343
while (0 < stream.avail_in) {
@@ -307,9 +363,22 @@ static void inflate_request(const char *prog_name, int out)
307363
done:
308364
git_inflate_end(&stream);
309365
close(out);
366+
free(full_request);
367+
}
368+
369+
static void copy_request(const char *prog_name, int out)
370+
{
371+
unsigned char *buf;
372+
ssize_t n = read_request(0, &buf);
373+
if (n < 0)
374+
die_errno("error reading request body");
375+
if (write_in_full(out, buf, n) != n)
376+
die("%s aborted reading request", prog_name);
377+
close(out);
378+
free(buf);
310379
}
311380

312-
static void run_service(const char **argv)
381+
static void run_service(const char **argv, int buffer_input)
313382
{
314383
const char *encoding = getenv("HTTP_CONTENT_ENCODING");
315384
const char *user = getenv("REMOTE_USER");
@@ -334,15 +403,17 @@ static void run_service(const char **argv)
334403
"GIT_COMMITTER_EMAIL=%s@http.%s", user, host);
335404

336405
cld.argv = argv;
337-
if (gzipped_request)
406+
if (buffer_input || gzipped_request)
338407
cld.in = -1;
339408
cld.git_cmd = 1;
340409
if (start_command(&cld))
341410
exit(1);
342411

343412
close(1);
344413
if (gzipped_request)
345-
inflate_request(argv[0], cld.in);
414+
inflate_request(argv[0], cld.in, buffer_input);
415+
else if (buffer_input)
416+
copy_request(argv[0], cld.in);
346417
else
347418
close(0);
348419

@@ -392,7 +463,7 @@ static void get_info_refs(char *arg)
392463
packet_flush(1);
393464

394465
argv[0] = svc->name;
395-
run_service(argv);
466+
run_service(argv, 0);
396467

397468
} else {
398469
select_getanyfile();
@@ -496,7 +567,7 @@ static void service_rpc(char *service_name)
496567
end_headers();
497568

498569
argv[0] = svc->name;
499-
run_service(argv);
570+
run_service(argv, svc->buffer_input);
500571
strbuf_release(&buf);
501572
}
502573

@@ -623,6 +694,9 @@ int main(int argc, char **argv)
623694
not_found("Repository not exported: '%s'", dir);
624695

625696
http_config();
697+
max_request_buffer = git_env_ulong("GIT_HTTP_MAX_REQUEST_BUFFER",
698+
max_request_buffer);
699+
626700
cmd->imp(cmd_arg);
627701
return 0;
628702
}

t/t5551-http-fetch-smart.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,5 +253,16 @@ test_expect_success EXPENSIVE 'clone the 50,000 tag repo to check OS command lin
253253
)
254254
'
255255

256+
test_expect_success EXPENSIVE 'http can handle enormous ref negotiation' '
257+
git -C too-many-refs fetch -q --tags &&
258+
(
259+
cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
260+
create_tags 50001 100000
261+
) &&
262+
git -C too-many-refs fetch -q --tags &&
263+
git -C too-many-refs for-each-ref refs/tags >tags &&
264+
test_line_count = 100000 tags
265+
'
266+
256267
stop_httpd
257268
test_done

0 commit comments

Comments
 (0)