-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Bug Report
s3 performance is slow
Description
Ticket is based on topic in discord (“need-help” channel).
The problem is that I tried to use DVC with miniO (s3 compatible storage) and noticed that its performance is very slow.
My env:
- miniO storage location is on SSD (Intel Optane, high performance)
- MiniO server <------ 1000Mbit/sec --------> DVC (s3 client)
- Bucket 40Gb - 410k files (each file <= 400KB)
DVC version: 2.0.17 (deb)
---------------------------------
Platform: Python 3.8.8 on Linux-5.4.0-70-generic-x86_64-with-glibc2.4
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/sda
Repo: dvc, git
When I did dvc pull -j 20 maximum speed was 80 mbit/sec but average was about 40. I download the bucket for 220 minutes.
What I tried else:
dvc pull -j 80 - no improvements.
awscli - aws tool can maximum 160 Mbit/sec downloading speed. I tried different settings, but I couldn't exceed the limit.
s4cmd - maximum I got is about 130 Mbit/sec, 64minutes to get the bucket.
s5cmd - maximum 960 Mbit/sec and less than 10 minutes to download the whole bucket (GoLang)
So you can see that storage performance is okay but the download speed of tools written on python can not reach maximum.
Reproduce
Profiler stat: https://disk.yandex.ru/d/XNajwHgWYlPSHA