Replies: 1 comment 4 replies
-
This thread provides the exact details: #520 Based on the discussion, I did some verification with a csv file of equivalent size ~1 GB, it approximately took 5 mins Why would smart_open treat a gz file differently when the request is to read/write as binary ? Is smart_open trying to gunzip and stream as gzip to destination ? GZ file of ~750 MB took around ~21 mins Any response to help move forward is greatly appreciated. Thank you ! |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Prior to smart_open, copying large gz file, size 850MB:
S3 copy from us-east to us-west bucket: download to local EC2 from us-east and upload disk file to us-west bucket. Approximately 3 mins.
Implementing smart_open:
` chunk_size = (64 * 1024 * 1024)
`
Stream copy file from us-east to us-west, taking upwards of 16 mins. Any way to improve the stream copy performance ?
Thank you !
Beta Was this translation helpful? Give feedback.
All reactions