11# Usage
22
3- ## Examples
3+ ## Ingestion and Export
44
5- The following example shows how to ingest a 3D seismic stack into
6- MDIO format . Only one lossless copy will be made.
5+ The following example shows how to minimally ingest a 3D seismic stack into
6+ a local ** _ MDIO _ ** file . Only one lossless copy will be made.
77
8- There are many more options, please see the [ usage ] ( #usage ) .
8+ There are many more options, please see the [ CLI Reference ] ( #cli-reference ) .
99
1010``` shell
1111$ mdio segy import \
@@ -24,6 +24,168 @@ $ mdio segy export \
2424 -o path_to_segy_file.segy
2525```
2626
27+ ## Cloud Connection Strings
28+
29+ ** _ MDIO_ ** supports I/O on major cloud service providers. The cloud I/O capabilities are
30+ supported using the [ fsspec] ( https://filesystem-spec.readthedocs.io/ ) and its specialized
31+ version for:
32+
33+ - Amazon Web Services (AWS S3) - [ s3fs] ( https://s3fs.readthedocs.io )
34+ - Google Cloud Provider (GCP GCS) - [ gcsfs] ( https://gcsfs.readthedocs.io )
35+ - Microsoft Azure (Datalake Gen2) - [ adlfs] ( https://github.com/fsspec/adlfs )
36+
37+ Any other file-system supported by ` fsspec ` will also be supported by ** _ MDIO_ ** . However,
38+ we will focus on the major providers here.
39+
40+ The protocols that help choose a backend (i.e. ` s3:// ` , ` gs:// ` , or ` az:// ` ) can be passed
41+ prepended to the ** _ MDIO_ ** path.
42+
43+ The connection string can be passed to the command-line-interface (CLI) using the
44+ ` -storage, --storage-options ` flag as a JSON string or the Python API with the ` storage_options `
45+ keyword argument as a Python dictionary.
46+
47+ ```` {warning}
48+ On Windows clients, JSON strings are passed to the CLI with a special escape character.
49+
50+ For instance a JSON string:
51+ ```json
52+ {"key": "my_super_private_key", "secret": "my_super_private_secret"}
53+ ```
54+ must be passed with an escape character `\` for inner quotes as:
55+ ```shell
56+ "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
57+ ```
58+ whereas, on Linux bash this works just fine:
59+ ```shell
60+ '{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
61+ ```
62+ If this done incorrectly, you will get an invalid JSON string error from the CLI.
63+ ````
64+
65+ ### Amazon Web Services
66+
67+ Credentials can be automatically fetched from pre-authenticated AWS CLI.
68+ See [ here] ( https://s3fs.readthedocs.io/en/latest/index.html#credentials ) for the order ` s3fs `
69+ checks them. If it is not pre-authenticated, you need to pass ` --storage-options ` .
70+
71+ ** Prefix:**
72+ ` s3:// `
73+
74+ ** Storage Options:**
75+ ` key ` : The auth key from AWS
76+ ` secret ` : The auth secret from AWS
77+
78+ Using UNIX:
79+
80+ ``` shell
81+ mdio segy import \
82+ --input-segy-path path/to/my.segy
83+ --output-mdio-file s3://bucket/prefix/my.mdio
84+ --header-locations 189,193
85+ --storage-options ' {"key": "my_super_private_key", "secret": "my_super_private_secret"}'
86+ ```
87+
88+ Using Windows (note the extra escape characters ` \ ` ):
89+
90+ ``` console
91+ mdio segy import \
92+ --input-segy-path path/to/my.segy
93+ --output-mdio-file s3://bucket/prefix/my.mdio
94+ --header-locations 189,193
95+ --storage-options "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
96+ ```
97+
98+ ### Google Cloud Provider
99+
100+ Credentials can be automatically fetched from pre-authenticated ` gcloud ` CLI.
101+ See [ here] ( https://gcsfs.readthedocs.io/en/latest/#credentials ) for the order ` gcsfs `
102+ checks them. If it is not pre-authenticated, you need to pass ` --storage-options ` .
103+
104+ GCP uses [ service accounts] ( https://cloud.google.com/iam/docs/service-accounts ) to pass
105+ authentication information to APIs.
106+
107+ ** Prefix:**
108+ ` gs:// ` or ` gcs:// `
109+
110+ ** Storage Options:**
111+ ` token ` : The service account JSON value as string, or local path to JSON
112+
113+ Using a service account:
114+
115+ ``` shell
116+ mdio segy import \
117+ --input-segy-path path/to/my.segy
118+ --output-mdio-file gs://bucket/prefix/my.mdio
119+ --header-locations 189,193
120+ --storage-options ' {"token": "~/.config/gcloud/application_default_credentials.json"}'
121+ ```
122+
123+ Using browser to populate authentication:
124+
125+ ``` shell
126+ mdio segy import \
127+ --input-segy-path path/to/my.segy
128+ --output-mdio-file gs://bucket/prefix/my.mdio
129+ --header-locations 189,193
130+ --storage-options ' {"token": "browser"}'
131+ ```
132+
133+ ### Microsoft Azure
134+
135+ There are various ways to authenticate with Azure Data Lake (ADL).
136+ See [ here] ( https://github.com/fsspec/adlfs#details ) for some details.
137+ If ADL is not pre-authenticated, you need to pass ` --storage-options ` .
138+
139+ ** Prefix:**
140+ ` az:// ` or ` abfs:// `
141+
142+ ** Storage Options:**
143+ ` account_name ` : Azure Data Lake storage account name
144+ ` account_key ` : Azure Data Lake storage account access key
145+
146+ ``` shell
147+ mdio segy import \
148+ --input-segy-path path/to/my.segy
149+ --output-mdio-file az://bucket/prefix/my.mdio
150+ --header-locations 189,193
151+ --storage-options ' {"account_name": "myaccount", "account_key": "my_super_private_key"}'
152+ ```
153+
154+ ### Advanced Cloud Features
155+
156+ There are additional functions provided by ` fsspec ` . These are advanced features and we refer
157+ the user to read ` fsspec ` [ documentation] ( https://filesystem-spec.readthedocs.io/en/latest/features.html ) .
158+ Some useful examples are:
159+
160+ - Caching Files Locally
161+ - Remote Write Caching
162+ - File Buffering and random access
163+ - Mount anything with FUSE
164+
165+ ```` {note}
166+ When combining advanced protocols like `simplecache` and using a remote store like `s3` the
167+ URL can be chained like `simplecache::s3://bucket/prefix/file.mdio`. When doing this the
168+ `--storage-options` argument must explicitly state parameters for the cloud backend and the
169+ extra protocol. For the above example it would look like this:
170+
171+ ```json
172+ {
173+ "s3": {
174+ "key": "my_super_private_key",
175+ "secret": "my_super_private_secret"
176+ },
177+ "simplecache": {
178+ "cache_storage": "/custom/temp/storage/path"
179+ }
180+ }
181+ ```
182+
183+ In one line:
184+ ```json
185+ {"s3": {"key": "my_super_private_key", "secret": "my_super_private_secret"}, "simplecache": {"cache_storage": "/custom/temp/storage/path"}
186+ ```
187+ ````
188+
27189## CLI Reference
28190
29191MDIO provides a convenient command-line-interface (CLI) to do
0 commit comments