Streaming gzipped responses

I need to process big XML responses as a stream. The uncompressed responses can be multiple hundred megabytes in size, so loading them entirely into memory before handing them to the XML parser is not an option.

I'm using lxml to parse and I just hand the `response.raw` to its `iterparse()` function, as described somewhere in the requests docs. This works fine for uncompressed responses.

Unfortunately, the API I'm calling isn't particularly good. So it will sometimes return `Content-Encoding: gzip` even if I explicitly ask for uncompressed data. Also, the compression ratio on these extremely repetitive and verbose XML files is really good (10x+), so I'd really like to make use of compressed responses.

Is this possible with requests? I couldn't find it in the documentation. Researching deeper into urllib3, its [HTTPResponse.read()](https://github.com/shazow/urllib3/blob/master/urllib3/response.py#L140) method seems to support a `decode_content` parameter. If not set, urllib3 falls back to what's set in the constructor. When requests calls the constructor in [requests.adapters.HTTPAdapter.send()](https://github.com/kennethreitz/requests/blob/fe693c492242ae532211e0c173324f09ca8cf227/requests/adapters.py#L343), it explicitly sets `decode_content` to False.

Is there a reason why requests does that?

Strangely, `iter_content()` actually sets `decode_content=True` while reading. Why here? It all appears a bit arbitrary. I don't really understand the motivation for doing it one way here and another way there.  
Personally, I can't really use `iter_content()` of course because I need a file-like object for lxml.

I previously wrote my own file-like object that I can hook in between requests and lxml, but of course buffering is hard and I feel like smarter people than me have written this before, so I'd prefer to not have to roll my own.

What's your advice how to handle this? Should requests be changed to default to setting `decode_content=True` in urllib3?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Streaming gzipped responses #2155

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Streaming gzipped responses #2155

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions