Skip to content

Commit 6616f24

Browse files
authored
Provide information on how S3MapDataset works (#171)
Provide information on how S3MapDataset works --------- Co-authored-by: Ilya Isaev <[email protected]>
1 parent fe314ff commit 6616f24

File tree

1 file changed

+17
-13
lines changed

1 file changed

+17
-13
lines changed

README.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -52,36 +52,40 @@ End to end example of how to use `s3torchconnector` can be found under the
5252

5353
The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style
5454
dataset, by specifying an S3 URI (a bucket and optional prefix) and the region the bucket is located in:
55-
```shell
55+
```py
5656
from s3torchconnector import S3MapDataset, S3IterableDataset
5757

5858
# You need to update <BUCKET> and <PREFIX>
5959
DATASET_URI="s3://<BUCKET>/<PREFIX>"
6060
REGION = "us-east-1"
6161

62-
map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION)
63-
6462
iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION)
6563

64+
# Datasets are also iterators.
65+
for item in iterable_dataset:
66+
print(item.key)
67+
68+
# S3MapDataset eagerly lists all the objects under the given prefix
69+
# to provide support of random access.
70+
# S3MapDataset builds a list of all objects at the first access to its elements or
71+
# at the first call to get the number of elements, whichever happens first.
72+
# This process might take some time and may give the impression of being unresponsive.
73+
map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION)
74+
6675
# Randomly access to an item in map_dataset.
67-
object = map_dataset[0]
76+
item = map_dataset[0]
6877

6978
# Learn about bucket, key, and content of the object
70-
bucket = object.bucket
71-
key = object.key
72-
content = object.read()
79+
bucket = item.bucket
80+
key = item.key
81+
content = item.read()
7382
len(content)
74-
75-
# Datasets are also iterators.
76-
for object in iterable_dataset:
77-
print(object.key)
78-
7983
```
8084

8185
In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading
8286
model checkpoints directly to and from an S3 bucket.
8387

84-
```shell
88+
```py
8589
from s3torchconnector import S3Checkpoint
8690

8791
import torchvision

0 commit comments

Comments
 (0)