-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Closed
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
I am finding that some files downloaded with urllib are always truncated. I have a demonstration file which is 187527168 bytes of NULs.
If I download with wget it always is retrieved ok:
root@697bf25b6113:~# wget https://electricworry-public.s3.eu-west-1.amazonaws.com/test -O test-wget
--2025-01-24 14:41:27-- https://electricworry-public.s3.eu-west-1.amazonaws.com/test
Resolving electricworry-public.s3.eu-west-1.amazonaws.com (electricworry-public.s3.eu-west-1.amazonaws.com)... 52.218.90.80, 52.218.108.120, 3.5.72.214, ...
Connecting to electricworry-public.s3.eu-west-1.amazonaws.com (electricworry-public.s3.eu-west-1.amazonaws.com)|52.218.90.80|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 187527168 (179M) [binary/octet-stream]
Saving to: 'test-wget'
test-wget 100%[=========================================================================================================================================>] 178.84M 5.57MB/s in 33s
2025-01-24 14:42:01 (5.38 MB/s) - 'test-wget' saved [187527168/187527168]
root@697bf25b6113:~# ls -l
total 183132
-rw-r--r-- 1 root root 187527168 Jan 24 14:31 test-wget
If I attempt the following python3 code I end up with a slightly truncated file:
import urllib.request
import shutil
request = urllib.request.Request("https://electricworry-public.s3.eu-west-1.amazonaws.com/test")
r = urllib.request.urlopen(request, None, 1000)
f = open("test-python", "wb")
shutil.copyfileobj(r, f)
f.close()
Here's what I end up with:
root@697bf25b6113:~# python3
Python 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> import shutil
>>> request = urllib.request.Request("https://electricworry-public.s3.eu-west-1.amazonaws.com/test")
>>> r = urllib.request.urlopen(request, None, 1000)
>>> f = open("test-python", "wb")
>>> shutil.copyfileobj(r, f)
>>> f.close()
>>>
root@697bf25b6113:~# ls -l
total 363136
-rw-r--r-- 1 root root 184313073 Jan 24 14:43 test-python
-rw-r--r-- 1 root root 187527168 Jan 24 14:31 test-wget
I've tried this on several computers:
- Physical host Dell XPS 13 running Ubuntu 24.04
- Physical own-build workstation running Linux Mint 22.1 Xia
- Docker container running debian:bookworm
A wireshark packet capture seems to indicate that the remote side completes and closes the connection (FIN, PSH, ACK) which it should as urllib by default sends "Connection: close" in the headers.
Is this a known problem? The problem doesn't happen when I switch from https to http.
CPython versions tested on:
3.11, 3.12
Operating systems tested on:
Linux
Metadata
Metadata
Assignees
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error