Skip to content

Commit 77e5073

Browse files
committed
add cloud credential docs
1 parent bca53ef commit 77e5073

File tree

1 file changed

+165
-3
lines changed

1 file changed

+165
-3
lines changed

docs/usage.md

Lines changed: 165 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Usage
22

3-
## Examples
3+
## Ingestion and Export
44

55
The following example shows how to ingest a 3D seismic stack into
6-
MDIO format. Only one lossless copy will be made.
6+
**_MDIO_** format. Only one lossless copy will be made.
77

8-
There are many more options, please see the [usage](#usage).
8+
There are many more options, please see the [CLI Reference](#cli-reference).
99

1010
```shell
1111
$ mdio segy import \
@@ -24,6 +24,168 @@ $ mdio segy export \
2424
-o path_to_segy_file.segy
2525
```
2626

27+
## Cloud Connection Strings
28+
29+
**_MDIO_** supports I/O on major cloud service providers. The cloud I/O capabilities are
30+
supported using the [fsspec](https://filesystem-spec.readthedocs.io/) and its specialized
31+
version for:
32+
33+
- Amazon Web Services (AWS S3) - [s3fs](https://s3fs.readthedocs.io)
34+
- Google Cloud Provider (GCP GCS) - [gcsfs](https://gcsfs.readthedocs.io)
35+
- Microsoft Azure (Datalake Gen2) - [adlfs](https://github.com/fsspec/adlfs)
36+
37+
Any other file-system supported by `fsspec` will also be supported by **_MDIO_**. However,
38+
we will focus on the major providers here.
39+
40+
The protocols that help choose a backend (i.e. `s3://`, `gs://`, or `az://`) can be passed
41+
prepended to the **_MDIO_** path.
42+
43+
The connection string can be passed to the command-line-interface (CLI) using the
44+
`-storage, --storage-options` flag as a JSON string or the Python API with the `storage_options`
45+
keyword argument as a Python dictionary.
46+
47+
````{warning}
48+
On Windows clients, JSON strings are passed to the CLI with a special escape character.
49+
50+
For instance a JSON string:
51+
```json
52+
{"key": "my_super_private_key", "secret": "my_super_private_secret"}
53+
```
54+
must be passed with an escape character `\` for inner quotes as:
55+
```shell
56+
"{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
57+
```
58+
whereas, on Linux bash this works just fine:
59+
```shell
60+
'{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
61+
```
62+
If this done incorrectly, you will get an invalid JSON string error from the CLI.
63+
````
64+
65+
### Amazon Web Services
66+
67+
Credentials can be automatically fetched from pre-authenticated AWS CLI.
68+
See [here](https://s3fs.readthedocs.io/en/latest/index.html#credentials) for the order `s3fs`
69+
checks them. If it is not pre-authenticated, you need to pass `--storage-options`.
70+
71+
**Prefix:**
72+
`s3://`
73+
74+
**Storage Options:**
75+
`key`: The auth key from AWS
76+
`secret`: The auth secret from AWS
77+
78+
Using UNIX:
79+
80+
```shell
81+
mdio segy import \
82+
--input-segy-path path/to/my.segy
83+
--output-mdio-file s3://bucket/prefix/my.mdio
84+
--header-locations 189,193
85+
--storage-options '{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
86+
```
87+
88+
Using Windows (note the extra escape characters `\`):
89+
90+
```console
91+
mdio segy import \
92+
--input-segy-path path/to/my.segy
93+
--output-mdio-file s3://bucket/prefix/my.mdio
94+
--header-locations 189,193
95+
--storage-options "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
96+
```
97+
98+
### Google Cloud Provider
99+
100+
Credentials can be automatically fetched from pre-authenticated `gcloud` CLI.
101+
See [here](https://gcsfs.readthedocs.io/en/latest/#credentials) for the order `gcsfs`
102+
checks them. If it is not pre-authenticated, you need to pass `--storage-options`.
103+
104+
GCP uses [service accounts](https://cloud.google.com/iam/docs/service-accounts) to pass
105+
authentication information to APIs.
106+
107+
**Prefix:**
108+
`gs://` or `gcs://`
109+
110+
**Storage Options:**
111+
`token`: The service account JSON value as string, or local path to JSON
112+
113+
Using a service account:
114+
115+
```shell
116+
mdio segy import \
117+
--input-segy-path path/to/my.segy
118+
--output-mdio-file gs://bucket/prefix/my.mdio
119+
--header-locations 189,193
120+
--storage-options '{"token": "~/.config/gcloud/application_default_credentials.json"}'
121+
```
122+
123+
Using browser to populate authentication:
124+
125+
```shell
126+
mdio segy import \
127+
--input-segy-path path/to/my.segy
128+
--output-mdio-file gs://bucket/prefix/my.mdio
129+
--header-locations 189,193
130+
--storage-options '{"token": "browser"}'
131+
```
132+
133+
### Microsoft Azure
134+
135+
There are various ways to authenticate with Azure Data Lake (ADL).
136+
See [here](https://github.com/fsspec/adlfs#details) for some details.
137+
If ADL is not pre-authenticated, you need to pass `--storage-options`.
138+
139+
**Prefix:**
140+
`az://` or `abfs://`
141+
142+
**Storage Options:**
143+
`account_name`: Azure Data Lake storage account name
144+
`account_key`: Azure Data Lake storage account access key
145+
146+
```shell
147+
mdio segy import \
148+
--input-segy-path path/to/my.segy
149+
--output-mdio-file az://bucket/prefix/my.mdio
150+
--header-locations 189,193
151+
--storage-options '{"account_name": "myaccount", "account_key": "my_super_private_key"}'
152+
```
153+
154+
### Advanced Cloud Features
155+
156+
There are additional functions provided by `fsspec`. These are advanced features and we refer
157+
the user to read `fsspec` [documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html).
158+
Some useful examples are:
159+
160+
- Caching Files Locally
161+
- Remote Write Caching
162+
- File Buffering and random access
163+
- Mount anything with FUSE
164+
165+
````{note}
166+
When combining advanced protocols like `simplecache` and using a remote store like `s3` the
167+
URL can be chained like `simplecache::s3://bucket/prefix/file.mdio`. When doing this the
168+
`--storage-options` argument must explicitly state parameters for the cloud backend and the
169+
extra protocol. For the above example it would look like this:
170+
171+
```json
172+
{
173+
"s3": {
174+
"key": "my_super_private_key",
175+
"secret": "my_super_private_secret"
176+
},
177+
"simplecache": {
178+
"cache_storage": "/custom/temp/storage/path"
179+
}
180+
}
181+
```
182+
183+
In one line:
184+
```json
185+
{"s3": {"key": "my_super_private_key", "secret": "my_super_private_secret"}, "simplecache": {"cache_storage": "/custom/temp/storage/path"}
186+
```
187+
````
188+
27189
## CLI Reference
28190

29191
MDIO provides a convenient command-line-interface (CLI) to do

0 commit comments

Comments
 (0)