Skip to content

Commit 06c2312

Browse files
authored
Update DEVELOPMENT.md and fix typos (#193)
1 parent 2269e0c commit 06c2312

File tree

3 files changed

+41
-10
lines changed

3 files changed

+41
-10
lines changed

doc/DEVELOPMENT.md

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22

33
To develop `s3torchconnector`, you need to have Python, `pip` and `python-venv` installed.
44

5-
`s3torchconnector` uses `s3torchconnectorclient` as the underlying S3 Connector. `s3torchconnectorclient` is a Python wrapper around MountpointS3Client that uses S3 CRT to optimize performance
6-
of S3 read/write
5+
`s3torchconnector` uses `s3torchconnectorclient` as the underlying S3 Connector. `s3torchconnectorclient` is a
6+
Python wrapper around MountpointS3Client that uses S3 CRT to optimize performance of S3 read/write
77
.
8-
Since MountpointS3Client is implemented in Rust, for development and building from source, you will need to install `clang`, `cmake` and rust compiler (as detailed below).
8+
Since MountpointS3Client is implemented in Rust, for development and building from source, you will need to install
9+
`clang`, `cmake` and rust compiler (as detailed below).
910

1011
Note: CLI commands for Ubuntu/Debian
1112
#### Install Python 3.x and pip
@@ -44,8 +45,8 @@ sudo apt install python3-pip
4445
```
4546

4647

47-
When you make changes to the Rust code, you need to run `pip install -e s3torchconnectorclient` before changes will be viewable from
48-
Python.
48+
When you make changes to the Rust code, you need to run `pip install -e s3torchconnectorclient` before changes will
49+
be viewable from Python.
4950

5051

5152
### Licensing
@@ -136,5 +137,35 @@ The file will include AWS CRT logs.
136137
```
137138
This will set the log level to TRACE by default, DEBUG for mountpoint-s3-client and ERROR for AWS CRT.
138139

139-
For more examples please check the
140-
[env_logger documentation](https://docs.rs/env_logger/latest/env_logger/#enabling-logging).
140+
For more examples please check the
141+
[env_logger documentation](https://docs.rs/env_logger/latest/env_logger/#enabling-logging).
142+
143+
### Fine Tuning
144+
Using S3ClientConfig you can set up the following parameters for the underlying S3 client:
145+
* `throughput_target_gbps(float)`: Throughput target in Gigabits per second (Gbps) that we are trying to reach.
146+
**10.0 Gbps** by default (may change in future).
147+
148+
* `part_size(int)`: Size (bytes) of file parts that will be uploaded/downloaded.
149+
Note: for saving checkpoints, the inner client will adjust the part size to meet the service limits.
150+
(max number of parts per upload is 10,000, minimum upload part size is 5 MiB).
151+
Part size must have **values between 5MiB and 5GiB.** Is set by default to **8MiB** (may change in future).
152+
153+
For example this can be passed in like:
154+
```py
155+
from s3torchconnector import S3MapDataset, S3ClientConfig
156+
157+
# Setup for DATASET_URI and REGION.
158+
...
159+
# Setting part_size to 5 MiB and throughput_target_gbps to 15 Gbps.
160+
config = S3ClientConfig(part_size=5 * 1024 * 1024, throughput_target_gbps=15)
161+
# Passing this on to an S3MapDataset.
162+
s3_map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=config)
163+
# Updating the configuration for checkpoints.
164+
# Please note that you can also pass in a different configuration to checkpoints.
165+
s3_checkpoint = S3Checkpoint(region=REGION, s3client_config=config)
166+
# Works similarly for Lightning checkpoints.
167+
s3_lightning_checkpoint = S3LightningCheckpoint(region=REGION, s3client_config=config)
168+
```
169+
170+
**When modifying the default values for these flags, we strongly recommend to run benchmarking to ensure you are not
171+
introducing a performance regression.**

s3torchconnector/src/s3torchconnector/_s3client/s3client_config.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,11 @@ class S3ClientConfig:
1010
Args:
1111
throughput_target_gbps(float): Throughput target in Gigabits per second (Gbps) that we are trying to reach.
1212
10.0 Gbps by default (may change in future).
13-
part_size(int): Size, in bytes, of parts that files will be downloaded or uploaded in.
13+
part_size(int): Size (bytes) of file parts that will be uploaded/downloaded.
1414
Note: for saving checkpoints, the inner client will adjust the part size to meet the service limits.
1515
(max number of parts per upload is 10,000, minimum upload part size is 5 MiB).
1616
Part size must have values between 5MiB and 5GiB.
17-
8MB by default (may change in future).
17+
8MiB by default (may change in future).
1818
"""
1919

2020
throughput_target_gbps: float = 10.0

s3torchconnectorclient/Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)