You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+47-23Lines changed: 47 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,18 @@
1
1
# Amazon S3 Connector for PyTorch
2
-
The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in Amazon S3. Using the S3 Connector for PyTorch
3
-
automatically optimizes performance when downloading training data from and writing checkpoints to Amazon S3, eliminating the need to write your own code to list S3 buckets and manage concurrent requests.
4
-
5
-
6
-
Amazon S3 Connector for PyTorch provides implementations of PyTorch's [dataset primitives](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) that you can use to load training data from Amazon S3.
7
-
It supports both [map-style datasets](https://pytorch.org/docs/stable/data.html#map-style-datasets) for random data access patterns and
8
-
[iterable-style datasets](https://pytorch.org/docs/stable/data.html#iterable-style-datasets) for streaming sequential data access patterns.
9
-
The S3 Connector for PyTorch also includes a checkpointing interface to save and load checkpoints directly to Amazon S3, without first saving to local storage.
2
+
The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access or store data in
3
+
Amazon S3. Using the S3 Connector for PyTorch
4
+
automatically optimizes performance when downloading training data from and writing checkpoints to Amazon S3,
5
+
eliminating the need to write your own code to list S3 buckets and manage concurrent requests.
6
+
7
+
8
+
Amazon S3 Connector for PyTorch provides implementations of PyTorch's
9
+
[dataset primitives](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) that you can use to load
10
+
training data from Amazon S3.
11
+
It supports both [map-style datasets](https://pytorch.org/docs/stable/data.html#map-style-datasets) for random data
12
+
access patterns and [iterable-style datasets](https://pytorch.org/docs/stable/data.html#iterable-style-datasets) for
13
+
streaming sequential data access patterns.
14
+
The S3 Connector for PyTorch also includes a checkpointing interface to save and load checkpoints directly to
15
+
Amazon S3, without first saving to local storage.
10
16
11
17
12
18
## Getting Started
@@ -22,25 +28,30 @@ automatically optimizes performance when downloading training data from and writ
22
28
pip install s3torchconnector
23
29
```
24
30
25
-
Amazon S3 Connector for PyTorch supports only Linux via Pip for now. For other platforms, see [DEVELOPMENT](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/doc/DEVELOPMENT.md) for build instructions.
31
+
Amazon S3 Connector for PyTorch supports only Linux via Pip for now. For other platforms, see
32
+
[DEVELOPMENT](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/doc/DEVELOPMENT.md) for build instructions.
26
33
27
34
### Configuration
28
35
29
36
To use `s3torchconnector`, AWS credentials must be provided through one of the following methods:
30
37
31
-
- If you are using this library on an EC2 instance, specify an IAM role and then give the EC2 instance access to that role.
38
+
- If you are using this library on an EC2 instance, specify an IAM role and then give the EC2 instance access to
39
+
that role.
32
40
- Install and configure [`awscli`](https://aws.amazon.com/cli/) and run `aws configure`.
33
-
- Set credentials in the AWS credentials profile file on the local system, located at: `~/.aws/credentials` on Unix or macOS.
41
+
- Set credentials in the AWS credentials profile file on the local system, located at: `~/.aws/credentials`
42
+
on Unix or macOS.
34
43
- Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables.
35
44
36
45
### Examples
37
46
38
47
[API docs](http://awslabs.github.io/s3-connector-for-pytorch) are showing API of the public components.
39
-
End to end example of how to use `s3torchconnector` can be found under the [examples](https://github.com/awslabs/s3-connector-for-pytorch/tree/main/examples) directory.
48
+
End to end example of how to use `s3torchconnector` can be found under the
The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style dataset, by specifying an S3 URI (a bucket and optional prefix) and the region the bucket is located in:
53
+
The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style
54
+
dataset, by specifying an S3 URI (a bucket and optional prefix) and the region the bucket is located in:
44
55
```shell
45
56
from s3torchconnector import S3MapDataset, S3IterableDataset
46
57
@@ -67,7 +78,8 @@ for object in iterable_dataset:
67
78
68
79
```
69
80
70
-
In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading model checkpoints directly to and from an S3 bucket.
81
+
In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading
82
+
model checkpoints directly to and from an S3 bucket.
71
83
72
84
```shell
73
85
from s3torchconnector import S3Checkpoint
@@ -92,28 +104,40 @@ with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
92
104
model.load_state_dict(state_dict)
93
105
```
94
106
95
-
Using datasets or checkpoints with [Amazon S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html)
107
+
Using datasets or checkpoints with
108
+
[Amazon S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html)
96
109
directory buckets requires only to update the URI, following `base-name--azid--x-s3` bucket name format.
97
110
For example, assuming the following directory bucket name `my-test-bucket--usw2-az1--x-s3` with the Availability Zone ID
98
-
usw2-az1, then the URI used will look like: `s3://my-test-bucket--usw2-az1--x-s3/<PREFIX>` (**please note that the prefix
99
-
for Amazon S3 Express One Zone should end with '/'**), paired with region us-west-2.
111
+
usw2-az1, then the URI used will look like: `s3://my-test-bucket--usw2-az1--x-s3/<PREFIX>` (**please note that the
112
+
prefix for Amazon S3 Express One Zone should end with '/'**), paired with region us-west-2.
100
113
101
114
## Contributing
102
-
We welcome contributions to Amazon S3 Connector for PyTorch. Please see [CONTRIBUTING](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/doc/CONTRIBUTING.md) For more information on how to report bugs or submit pull requests.
115
+
We welcome contributions to Amazon S3 Connector for PyTorch. Please
116
+
see [CONTRIBUTING](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/doc/CONTRIBUTING.md)
117
+
For more information on how to report bugs or submit pull requests.
103
118
104
119
### Development
105
-
See [DEVELOPMENT](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/doc/DEVELOPMENT.md) for information about code style,
106
-
development process, and guidelines.
120
+
See [DEVELOPMENT](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/doc/DEVELOPMENT.md) for information
121
+
about code style, development process, and guidelines.
107
122
123
+
### Compatibility with other storage services
124
+
S3 Connector forPyTorch delivers high throughput for PyTorch training jobs that access or store datain Amazon S3.
125
+
While it may be functional against other storage services that use S3-like APIs, they may inadvertently break when we
126
+
make changes to better support Amazon S3. We welcome contributions of minor compatibility fixes or performance
127
+
improvements for these services if the changes can be tested against Amazon S3.
108
128
109
129
### Security issue notifications
110
-
If you discover a potential security issue in this project we ask that you notify AWS Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/).
130
+
If you discover a potential security issue in this project we ask that you notify AWS Security via our
This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). See [CODE_OF_CONDUCT.md](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/doc/CODE_OF_CONDUCT.md) for more details.
135
+
This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
136
+
See [CODE_OF_CONDUCT.md](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/doc/CODE_OF_CONDUCT.md) for
137
+
more details.
115
138
116
139
## License
117
140
118
-
Amazon S3 Connector forPyTorch has a BSD 3-Clause License, as foundin the [LICENSE](https://github.com/awslabs/s3-connector-for-pytorch/blob/main/LICENSE) file.
141
+
Amazon S3 Connector forPyTorch has a BSD 3-Clause License, as foundin the
0 commit comments