|
2 | 2 |
|
3 | 3 | To develop `s3torchconnector`, you need to have Python, `pip` and `python-venv` installed. |
4 | 4 |
|
5 | | -`s3torchconnector` uses `s3torchconnectorclient` as the underlying S3 Connector. `s3torchconnectorclient` is a Python wrapper around MountpointS3Client that uses S3 CRT to optimize performance |
6 | | -of S3 read/write |
| 5 | +`s3torchconnector` uses `s3torchconnectorclient` as the underlying S3 Connector. `s3torchconnectorclient` is a |
| 6 | +Python wrapper around MountpointS3Client that uses S3 CRT to optimize performance of S3 read/write |
7 | 7 | . |
8 | | -Since MountpointS3Client is implemented in Rust, for development and building from source, you will need to install `clang`, `cmake` and rust compiler (as detailed below). |
| 8 | +Since MountpointS3Client is implemented in Rust, for development and building from source, you will need to install |
| 9 | +`clang`, `cmake` and rust compiler (as detailed below). |
9 | 10 |
|
10 | 11 | Note: CLI commands for Ubuntu/Debian |
11 | 12 | #### Install Python 3.x and pip |
@@ -44,8 +45,8 @@ sudo apt install python3-pip |
44 | 45 | ``` |
45 | 46 |
|
46 | 47 |
|
47 | | -When you make changes to the Rust code, you need to run `pip install -e s3torchconnectorclient` before changes will be viewable from |
48 | | -Python. |
| 48 | +When you make changes to the Rust code, you need to run `pip install -e s3torchconnectorclient` before changes will |
| 49 | +be viewable from Python. |
49 | 50 |
|
50 | 51 |
|
51 | 52 | ### Licensing |
@@ -136,5 +137,35 @@ The file will include AWS CRT logs. |
136 | 137 | ``` |
137 | 138 | This will set the log level to TRACE by default, DEBUG for mountpoint-s3-client and ERROR for AWS CRT. |
138 | 139 |
|
139 | | -For more examples please check the |
140 | | -[env_logger documentation](https://docs.rs/env_logger/latest/env_logger/#enabling-logging). |
| 140 | +For more examples please check the |
| 141 | +[env_logger documentation](https://docs.rs/env_logger/latest/env_logger/#enabling-logging). |
| 142 | + |
| 143 | +### Fine Tuning |
| 144 | +Using S3ClientConfig you can set up the following parameters for the underlying S3 client: |
| 145 | +* `throughput_target_gbps(float)`: Throughput target in Gigabits per second (Gbps) that we are trying to reach. |
| 146 | + **10.0 Gbps** by default (may change in future). |
| 147 | + |
| 148 | +* `part_size(int)`: Size (bytes) of file parts that will be uploaded/downloaded. |
| 149 | + Note: for saving checkpoints, the inner client will adjust the part size to meet the service limits. |
| 150 | + (max number of parts per upload is 10,000, minimum upload part size is 5 MiB). |
| 151 | + Part size must have **values between 5MiB and 5GiB.** Is set by default to **8MiB** (may change in future). |
| 152 | + |
| 153 | +For example this can be passed in like: |
| 154 | +```py |
| 155 | +from s3torchconnector import S3MapDataset, S3ClientConfig |
| 156 | + |
| 157 | +# Setup for DATASET_URI and REGION. |
| 158 | +... |
| 159 | +# Setting part_size to 5 MiB and throughput_target_gbps to 15 Gbps. |
| 160 | +config = S3ClientConfig(part_size=5 * 1024 * 1024, throughput_target_gbps=15) |
| 161 | +# Passing this on to an S3MapDataset. |
| 162 | +s3_map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=config) |
| 163 | +# Updating the configuration for checkpoints. |
| 164 | +# Please note that you can also pass in a different configuration to checkpoints. |
| 165 | +s3_checkpoint = S3Checkpoint(region=REGION, s3client_config=config) |
| 166 | +# Works similarly for Lightning checkpoints. |
| 167 | +s3_lightning_checkpoint = S3LightningCheckpoint(region=REGION, s3client_config=config) |
| 168 | +``` |
| 169 | + |
| 170 | +**When modifying the default values for these flags, we strongly recommend to run benchmarking to ensure you are not |
| 171 | +introducing a performance regression.** |
0 commit comments