This repository was archived by the owner on May 5, 2025. It is now read-only.
Commit 3b22b03
authored
feat: support zstd compression in miniostorage (#405)
* feat: support zstd compression in miniostorage
we want to use zstd compression when compressing files for storage in
object storage because it performs better than gzip which is what we
were using before
these changes are only being made to the minio storage service because
we want to consolidate the storage service functionality into this one
so both worker and API will be using this backend in the future (API was
already using this one)
we have to manually decompress the zstd compressed files in read_file
but HTTPResponse takes care of it for us if the content encoding of the
file is gzip
the is_already_gzipped argument is being deprecated in favour of
compression_type and is_compressed, also the ability to pass a str to
write_file is being deprecated. we're keeping track of the use of these
using sentry capture_message
* fix: address feedback
- using fget_object was unecessary since we were streaming the response
data regardless
- no need for all the warning logs and sentry stuff, we'll just do
a 3 step migration in both API and worker (update shared supporting
old behaviour, update {api,worker}, remove old behaviour support from
shared)
- zstandard version pinning can be more flexible
- add test for content type = application/x-gzip since there was some
specific handling for that in the GCP storage service
* fix: update MinioStorageService
- in write file:
- data arg is not BinaryIO it's actually bytes | str | IO[bytes]
bytes and str are self-explanatory it's just how it's being used
currently, so we must support it. IO[bytes] is there to support
files handles opened with "rb" that are being passed and BytesIO
objects
- start accepting None value for compression_type which will mean
no automatic compression even if is_compressed is false
- do automatic compression using gzip if is_compressed=False and
compression_type="gzip"
- in put_object set size = -1 and use a part_size of 20MiB. the
specific part size is arbitrary. Different sources online suggest
different numbers. It probably depends on the size of the
underlying data we're trying to send but 20MiB seems like a good
flat number to pick for now.
- in read_file:
- generally reorganize the function do spend less time under the try
except blocks
- use the CHUNK_SIZE const defined in storage/base for the amount to
read from the streams
- accept IO[bytes] for the file_obj since we don't use any of the
BinaryIO specific methods
- create GZipStreamReader that takes in a IO[bytes] and implements a
read() method that reads a certain amount of bytes from the IO[bytes]
compresses whatever it reads using gzip, and returns the result
* fix(minio): check urllib3 version in read_file
this is because if urllib3 is >= 2.0.0 and the zstd extra is installed
then it is capable (and will) decode zstd encoded data when it's used
in get_object
so when we create the MinioStorageService we check the urllib3 version
and we check if it's been installed with the zstd extra
this commit also adds a test to ensure that the gzip compression and
decompression used in the GzipStreamReader actually works
* feat: add feature flag for new minio storage
instead of doing a 0-100 launch of the new minio storage service i'd
like to have it so we incrementally ship it using a feature flag.
so if a repoid is passed to the get_appropriate_storage_service function
and the chosen storage is minio, then it will check the use_new_minio
feature to decide whether to use the new or old minio storage service
as mentioned this will be decided via the repoid (to reduce the impact
IF it is broken)
changes had to be made to avoid circular imports in the model_utils and
rollout_utils files
* fix: revert changes to old minio1 parent 27d6a8f commit 3b22b03
File tree
13 files changed
+837
-26
lines changed- tests/unit/storage
13 files changed
+837
-26
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
4 | 3 | | |
5 | 4 | | |
6 | 5 | | |
7 | | - | |
8 | 6 | | |
9 | 7 | | |
10 | 8 | | |
| |||
148 | 146 | | |
149 | 147 | | |
150 | 148 | | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | 149 | | |
170 | 150 | | |
171 | 151 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| 8 | + | |
7 | 9 | | |
8 | 10 | | |
9 | | - | |
10 | | - | |
11 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
12 | 16 | | |
13 | 17 | | |
14 | 18 | | |
15 | | - | |
| 19 | + | |
16 | 20 | | |
17 | 21 | | |
18 | 22 | | |
| |||
28 | 32 | | |
29 | 33 | | |
30 | 34 | | |
| 35 | + | |
| 36 | + | |
31 | 37 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
0 commit comments