Skip to content

Commit a0495fc

Browse files
committed
Add documentation page on environment variables
1 parent 9c99816 commit a0495fc

File tree

2 files changed

+205
-0
lines changed

2 files changed

+205
-0
lines changed

docs/environment.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# Environment Variables
2+
3+
MDIO can be configured using environment variables to customize behavior for import, export,
4+
and validation operations. These variables provide runtime control without requiring code changes.
5+
6+
## CPU and Performance
7+
8+
### `MDIO__EXPORT__CPU_COUNT`
9+
10+
**Type:** Integer
11+
**Default:** Number of logical CPUs available
12+
13+
Controls the number of CPUs used during SEG-Y export operations. Adjust this to balance
14+
performance with system resource availability.
15+
16+
```shell
17+
$ export MDIO__EXPORT__CPU_COUNT=8
18+
$ mdio segy export input.mdio output.segy
19+
```
20+
21+
### `MDIO__IMPORT__CPU_COUNT`
22+
23+
**Type:** Integer
24+
**Default:** Number of logical CPUs available
25+
26+
Controls the number of CPUs used during SEG-Y import operations. Higher values can
27+
significantly speed up ingestion of large datasets.
28+
29+
```shell
30+
$ export MDIO__IMPORT__CPU_COUNT=16
31+
$ mdio segy import input.segy output.mdio --header-locations 189,193
32+
```
33+
34+
## Grid Validation
35+
36+
### `MDIO__GRID__SPARSITY_RATIO_WARN`
37+
38+
**Type:** Float
39+
**Default:** 2.0
40+
41+
Sparsity ratio threshold that triggers warnings during grid validation. The sparsity ratio
42+
measures how sparse the trace grid is compared to a dense grid. Values above this threshold
43+
will log warnings but won't prevent operations.
44+
45+
```shell
46+
$ export MDIO__GRID__SPARSITY_RATIO_WARN=3.0
47+
```
48+
49+
### `MDIO__GRID__SPARSITY_RATIO_LIMIT`
50+
51+
**Type:** Float
52+
**Default:** 10.0
53+
54+
Sparsity ratio threshold that triggers errors and prevents operations. Use this to enforce
55+
quality standards and prevent ingestion of excessively sparse datasets that may indicate
56+
data quality issues.
57+
58+
```shell
59+
$ export MDIO__GRID__SPARSITY_RATIO_LIMIT=15.0
60+
```
61+
62+
## SEG-Y Processing
63+
64+
### `MDIO__IMPORT__SAVE_SEGY_FILE_HEADER`
65+
66+
**Type:** Boolean
67+
**Default:** false
68+
**Accepted values:** `true`, `false`, `1`, `0`, `yes`, `no`, `on`, `off`
69+
70+
When enabled, preserves the original SEG-Y textual file header during import.
71+
This is useful for maintaining full SEG-Y standard compliance and preserving survey metadata.
72+
73+
```shell
74+
$ export MDIO__IMPORT__SAVE_SEGY_FILE_HEADER=true
75+
$ mdio segy import input.segy output.mdio --header-locations 189,193
76+
```
77+
78+
### `MDIO__SEGY__SPEC`
79+
80+
**Type:** String (file path)
81+
**Default:** None
82+
83+
Path to a custom SEG-Y specification file that defines byte locations and data types for
84+
trace headers. Use this to handle non-standard SEG-Y variants or custom header layouts.
85+
86+
```shell
87+
$ export MDIO__SEGY__SPEC=/path/to/custom_spec.json
88+
$ mdio segy import input.segy output.mdio --header-locations 189,193
89+
```
90+
91+
### `MDIO__IMPORT__CLOUD_NATIVE`
92+
93+
**Type:** Boolean
94+
**Default:** false
95+
**Accepted values:** `true`, `false`, `1`, `0`, `yes`, `no`, `on`, `off`
96+
97+
Enables buffered reads during SEG-Y header scans to optimize performance when reading from or
98+
writing to cloud object storage (S3, GCS, Azure). This mode balances bandwidth usage with read
99+
latency by processing the file twice: first to determine optimal buffering, then to perform the
100+
actual ingestion.
101+
102+
```{note}
103+
This variable is designed for cloud storage I/O, regardless of where the compute is running.
104+
```
105+
106+
**When to use:**
107+
- Reading from cloud storage (e.g., `s3://bucket/input.segy`)
108+
- Writing to cloud storage (e.g., `gs://bucket/output.mdio`)
109+
110+
**When to skip:**
111+
- Local file paths on fast storage
112+
- Very slow network connections where bandwidth is the primary bottleneck
113+
114+
```shell
115+
$ export MDIO__IMPORT__CLOUD_NATIVE=true
116+
$ mdio segy import s3://bucket/input.segy output.mdio --header-locations 189,193
117+
```
118+
119+
## Development and Testing
120+
121+
### `MDIO_IGNORE_CHECKS`
122+
123+
**Type:** Boolean
124+
**Default:** false
125+
**Accepted values:** `true`, `false`, `1`, `0`, `yes`, `no`, `on`, `off`
126+
127+
Bypasses validation checks during MDIO operations. This is primarily intended for development,
128+
testing, or debugging scenarios where you need to work with non-standard data.
129+
130+
```{warning}
131+
Disabling checks can lead to corrupted output or unexpected behavior. Only use this
132+
when you understand the implications and are working in a controlled environment.
133+
```
134+
135+
```shell
136+
$ export MDIO_IGNORE_CHECKS=true
137+
$ mdio segy import input.segy output.mdio --header-locations 189,193
138+
```
139+
140+
## Deprecated Features
141+
142+
### `MDIO__IMPORT__RAW_HEADERS`
143+
144+
**Type:** Boolean
145+
**Default:** false
146+
**Accepted values:** `true`, `false`, `1`, `0`, `yes`, `no`, `on`, `off`
147+
148+
```{warning}
149+
This is a deprecated feature and is expected to be removed without warning in a future release.
150+
```
151+
152+
## Configuration Best Practices
153+
154+
### Setting Multiple Variables
155+
156+
You can configure multiple environment variables at once:
157+
158+
```shell
159+
# Set for current session
160+
export MDIO__IMPORT__CPU_COUNT=16
161+
export MDIO__GRID__SPARSITY_RATIO_LIMIT=15.0
162+
export MDIO__IMPORT__CLOUD_NATIVE=true
163+
164+
# Run MDIO commands
165+
mdio segy import input.segy output.mdio --header-locations 189,193
166+
```
167+
168+
### Persistent Configuration
169+
170+
To make environment variables permanent, add them to your shell profile:
171+
172+
**Bash/Zsh:**
173+
```shell
174+
# Add to ~/.bashrc or ~/.zshrc
175+
export MDIO__IMPORT__CPU_COUNT=16
176+
export MDIO__IMPORT__CLOUD_NATIVE=true
177+
```
178+
179+
**Windows:**
180+
```console
181+
# Set permanently in PowerShell (run as Administrator)
182+
[System.Environment]::SetEnvironmentVariable('MDIO__IMPORT__CPU_COUNT', '16', 'User')
183+
```
184+
185+
### Project-Specific Configuration
186+
187+
For project-specific settings, use a `.env` file with tools like `python-dotenv`:
188+
189+
```python
190+
# example_import.py
191+
from dotenv import load_dotenv
192+
import mdio
193+
194+
load_dotenv() # Load environment variables from .env file
195+
# Your MDIO operations here
196+
```
197+
198+
```shell
199+
# .env file
200+
MDIO__IMPORT__CPU_COUNT=16
201+
MDIO__GRID__SPARSITY_RATIO_LIMIT=15.0
202+
MDIO__IMPORT__CLOUD_NATIVE=true
203+
```
204+

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ end-before: <!-- github-only -->
1616
1717
installation
1818
cli_usage
19+
environment
1920
```
2021

2122
```{toctree}

0 commit comments

Comments
 (0)