Skip to content

s3 performance is slow #23

@tyommik

Description

@tyommik

Bug Report

s3 performance is slow

Description

Ticket is based on topic in discord (“need-help” channel).
The problem is that I tried to use DVC with miniO (s3 compatible storage) and noticed that its performance is very slow.

My env:

  • miniO storage location is on SSD (Intel Optane, high performance)
  • MiniO server <------ 1000Mbit/sec --------> DVC (s3 client)
  • Bucket 40Gb - 410k files (each file <= 400KB)
DVC version: 2.0.17 (deb)
---------------------------------
Platform: Python 3.8.8 on Linux-5.4.0-70-generic-x86_64-with-glibc2.4
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/sda
Repo: dvc, git

When I did dvc pull -j 20 maximum speed was 80 mbit/sec but average was about 40. I download the bucket for 220 minutes.

What I tried else:
dvc pull -j 80 - no improvements.
awscli - aws tool can maximum 160 Mbit/sec downloading speed. I tried different settings, but I couldn't exceed the limit.
s4cmd - maximum I got is about 130 Mbit/sec, 64minutes to get the bucket.
s5cmd - maximum 960 Mbit/sec and less than 10 minutes to download the whole bucket (GoLang)

So you can see that storage performance is okay but the download speed of tools written on python can not reach maximum.

Reproduce

Profiler stat: https://disk.yandex.ru/d/XNajwHgWYlPSHA

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions