Skip to content

Commit 28a8e9f

Browse files
tasansalmarkspec
andauthored
New CLI Options (#364)
* Added utility copy and info commands. * Improve formating * Updates to pass pre-commit. * Add access pattern to info. * Add tests. * Move utility commands to root level. * Update linting and tests. * Resolve PR comments. * Fix mdio copy. * Remove duplicate tmp in .gitignore. * Remove unnecessary try/except block in segy.py * Make 'info' command work with new CLI and add rich printing * make copy work with new CLI * Update description in info.py module * Change input mdio option to argument in info command * Update variable name and table title in info.py * Replace copy command filename options with arguments * make tests work for option -> argument conversion * Refactor imports and command options in copy.py * revert back to click types, better error handling * Refactor segy.py and update test_main.py Updated the segy.py file to import specific functions from click, rather than the entire module. The command decorator's function signatures and calls are also updated. This is to improve specificity and reduce unnecessary overhead. Additionally, modified the way command line arguments are passed in test_main.py as per the refactored changes in the main function. * Add future annotations to copy command * Refactor import location in copy.py The import statement for 'MDIOReader' in the copy.py file has been moved to a more appropriate position. This change aims to maximize importing efficiency by having the import statement closer to where the imported module is being used. * Refactor MDIO info command for better code organization The MDIO info command is refactored to enhance code readability and maintenance. The new structure involves separate functions for 'cast_stats', 'parse_grid' and 'pretty_print' to each perform distinct tasks. This improves the clear segregation of tasks and ease of future modifications. * Move pytest-dependency to test suite installs * Add future annotations import to info.py * Update usage documentation for mdio commands The documentation for the mdio commands has been updated to reflect changes in the command syntax. Parameters for input and output files are now required positional arguments, rather than options, enhancing the clarity and readability of the commands. * directly import click_params objects * directly import click_params objects * Add "fastentrypoints" to build requirements * change overwrite to flag --------- Co-authored-by: Mark Roberts <[email protected]>
1 parent d14a170 commit 28a8e9f

File tree

11 files changed

+363
-120
lines changed

11 files changed

+363
-120
lines changed

.gitignore

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ share/python-wheels/
2525
.installed.cfg
2626
*.egg
2727
MANIFEST
28-
28+
pip-*
29+
tmp*
2930
# PyInstaller
3031
# Usually these files are written by a python script from a template
3132
# before PyInstaller builds the exe, so as to inject date/other infos into it.
@@ -150,4 +151,5 @@ cython_debug/
150151
mdio1/*
151152
*/mdio1/*
152153
pytest-of-*
153-
tmp/
154+
tmp
155+
debugging/*

docs/usage.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ There are many more options, please see the [CLI Reference](#cli-reference).
99

1010
```shell
1111
$ mdio segy import \
12-
-i path_to_segy_file.segy \
13-
-o path_to_mdio_file.mdio \
12+
path_to_segy_file.segy \
13+
path_to_mdio_file.mdio \
1414
-loc 181,185 \
1515
-names inline,crossline
1616
```
@@ -20,8 +20,8 @@ should be executed.
2020

2121
```shell
2222
$ mdio segy export \
23-
-i path_to_mdio_file.mdio \
24-
-o path_to_segy_file.segy
23+
path_to_mdio_file.mdio \
24+
path_to_segy_file.segy
2525
```
2626

2727
## Cloud Connection Strings
@@ -79,19 +79,19 @@ Using UNIX:
7979

8080
```shell
8181
mdio segy import \
82-
--input-segy-path path/to/my.segy
83-
--output-mdio-file s3://bucket/prefix/my.mdio
84-
--header-locations 189,193
82+
path/to/my.segy \
83+
s3://bucket/prefix/my.mdio \
84+
--header-locations 189,193 \
8585
--storage-options '{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
8686
```
8787

8888
Using Windows (note the extra escape characters `\`):
8989

9090
```console
9191
mdio segy import \
92-
--input-segy-path path/to/my.segy
93-
--output-mdio-file s3://bucket/prefix/my.mdio
94-
--header-locations 189,193
92+
path/to/my.segy \
93+
s3://bucket/prefix/my.mdio \
94+
--header-locations 189,193 \
9595
--storage-options "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
9696
```
9797

@@ -114,19 +114,19 @@ Using a service account:
114114

115115
```shell
116116
mdio segy import \
117-
--input-segy-path path/to/my.segy
118-
--output-mdio-file gs://bucket/prefix/my.mdio
119-
--header-locations 189,193
117+
path/to/my.segy \
118+
gs://bucket/prefix/my.mdio \
119+
--header-locations 189,193 \
120120
--storage-options '{"token": "~/.config/gcloud/application_default_credentials.json"}'
121121
```
122122

123123
Using browser to populate authentication:
124124

125125
```shell
126126
mdio segy import \
127-
--input-segy-path path/to/my.segy
128-
--output-mdio-file gs://bucket/prefix/my.mdio
129-
--header-locations 189,193
127+
path/to/my.segy \
128+
gs://bucket/prefix/my.mdio \
129+
--header-locations 189,193 \
130130
--storage-options '{"token": "browser"}'
131131
```
132132

@@ -145,9 +145,9 @@ If ADL is not pre-authenticated, you need to pass `--storage-options`.
145145

146146
```shell
147147
mdio segy import \
148-
--input-segy-path path/to/my.segy
149-
--output-mdio-file az://bucket/prefix/my.mdio
150-
--header-locations 189,193
148+
path/to/my.segy \
149+
az://bucket/prefix/my.mdio \
150+
--header-locations 189,193 \
151151
--storage-options '{"account_name": "myaccount", "account_key": "my_super_private_key"}'
152152
```
153153

noxfile.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
"""Nox sessions."""
2+
3+
24
import os
35
import shlex
46
import shutil
@@ -161,7 +163,7 @@ def mypy(session: Session) -> None:
161163
def tests(session: Session) -> None:
162164
"""Run the test suite."""
163165
session.install(".")
164-
session.install("coverage[toml]", "pytest", "pygments")
166+
session.install("coverage[toml]", "pytest", "pygments", "pytest-dependency")
165167
try:
166168
session.run("coverage", "run", "--parallel", "-m", "pytest", *session.posargs)
167169
finally:

poetry.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ segyio = "^1.9.3"
3333
numba = "^0.59.0"
3434
psutil = "^5.9.5"
3535
fsspec = ">=2023.9.1"
36+
rich = "^13.7.1"
3637
urllib3 = "^1.26.18" # Workaround for poetry-plugin-export/issues/183
3738

3839
# Extras
@@ -109,5 +110,5 @@ ignore_missing_imports = true
109110

110111

111112
[build-system]
112-
requires = ["poetry-core"]
113+
requires = ["poetry-core", "fastentrypoints"]
113114
build-backend = "poetry.core.masonry.api"

src/mdio/__main__.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,21 @@
1111
import click
1212

1313

14-
KNOWN_MODULES = ["segy.py"]
14+
KNOWN_MODULES = [
15+
"segy.py",
16+
"copy.py",
17+
"info.py",
18+
]
1519

1620

1721
class MyCLI(click.MultiCommand):
1822
"""CLI generator via plugin design pattern.
1923
2024
This class dynamically loads command modules from the specified
21-
`plugin_folder`. Each command module should define a `cli` function
22-
that implements the command logic.
25+
`plugin_folder`. If the command us another CLI group, the command
26+
module must define a `cli = click.Group(...)` and subsequent
27+
commands must be added to this CLI. If it is a single utility it
28+
must have a variable named `cli` for the command to be exposed.
2329
2430
Args:
2531
- plugin_folder: Path to the directory containing command modules.

src/mdio/commands/copy.py

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
"""MDIO Dataset copy command."""
2+
3+
4+
from __future__ import annotations
5+
6+
from click import STRING
7+
from click import argument
8+
from click import command
9+
from click import option
10+
from click_params import JSON
11+
12+
13+
@command(name="copy")
14+
@argument("source-mdio-path", type=str)
15+
@argument("target-mdio-path", type=str)
16+
@option(
17+
"-access",
18+
"--access-pattern",
19+
required=False,
20+
default="012",
21+
help="Access pattern of the file",
22+
type=STRING,
23+
show_default=True,
24+
)
25+
@option(
26+
"-exc",
27+
"--excludes",
28+
required=False,
29+
default="",
30+
help="Data to exclude during copy, like `chunked_012`. The data values won’t be "
31+
"copied but an empty array will be created. If blank, it copies everything.",
32+
type=STRING,
33+
)
34+
@option(
35+
"-inc",
36+
"--includes",
37+
required=False,
38+
default="",
39+
help="Data to include during copy, like `trace_headers`. If not specified, and "
40+
"certain data is excluded, it will not copy headers. To preserve headers, "
41+
"specify trace_headers. If left blank, it will copy everything except what is "
42+
"specified in the 'excludes' parameter.",
43+
type=STRING,
44+
)
45+
@option(
46+
"-storage",
47+
"--storage-options",
48+
required=False,
49+
help="Custom storage options for cloud backends",
50+
type=JSON,
51+
)
52+
@option(
53+
"-overwrite",
54+
"--overwrite",
55+
is_flag=True,
56+
help="Flag to overwrite if mdio file if it exists",
57+
show_default=True,
58+
)
59+
def copy(
60+
source_mdio_path: str,
61+
target_mdio_path: str,
62+
access_pattern: str = "012",
63+
includes: str = "",
64+
excludes: str = "",
65+
storage_options: dict | None = None,
66+
overwrite: bool = False,
67+
) -> None:
68+
"""Copy a MDIO dataset to anpther MDIO dataset.
69+
70+
Can also copy with empty data to be filled later. See `excludes`
71+
and `includes` parameters.
72+
73+
More documentation about `excludes` and `includes` can be found
74+
in Zarr's documentation in `zarr.convenience.copy_store`.
75+
"""
76+
from mdio import MDIOReader
77+
78+
reader = MDIOReader(source_mdio_path, access_pattern=access_pattern)
79+
80+
reader.copy(
81+
dest_path_or_buffer=target_mdio_path,
82+
excludes=excludes,
83+
includes=includes,
84+
storage_options=storage_options,
85+
overwrite=overwrite,
86+
)
87+
88+
89+
cli = copy

src/mdio/commands/info.py

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
"""MDIO Dataset information command."""
2+
3+
4+
from __future__ import annotations
5+
6+
from typing import TYPE_CHECKING
7+
from typing import Any
8+
9+
from click import STRING
10+
from click import Choice
11+
from click import argument
12+
from click import command
13+
from click import option
14+
15+
16+
if TYPE_CHECKING:
17+
from mdio.core import Grid
18+
19+
20+
@command(name="info")
21+
@argument("mdio-path", type=STRING)
22+
@option(
23+
"-access",
24+
"--access-pattern",
25+
required=False,
26+
default="012",
27+
help="Access pattern of the file",
28+
type=STRING,
29+
show_default=True,
30+
)
31+
@option(
32+
"-format",
33+
"--output-format",
34+
required=False,
35+
default="pretty",
36+
help="Output format. Pretty console or JSON.",
37+
type=Choice(["pretty", "json"]),
38+
show_default=True,
39+
show_choices=True,
40+
)
41+
def info(
42+
mdio_path: str,
43+
output_format: str,
44+
access_pattern: str,
45+
) -> None:
46+
"""Provide information on a MDIO dataset.
47+
48+
By default, this returns human-readable information about the grid and stats for
49+
the dataset. If output-format is set to json then a json is returned to
50+
facilitate parsing.
51+
"""
52+
from mdio import MDIOReader
53+
54+
reader = MDIOReader(
55+
mdio_path,
56+
access_pattern=access_pattern,
57+
return_metadata=True,
58+
)
59+
60+
grid_dict = parse_grid(reader.grid)
61+
stats_dict = cast_stats(reader.stats)
62+
63+
mdio_info = {
64+
"path": mdio_path,
65+
"stats": stats_dict,
66+
"grid": grid_dict,
67+
}
68+
69+
if output_format == "pretty":
70+
pretty_print(mdio_info)
71+
72+
if output_format == "json":
73+
json_print(mdio_info)
74+
75+
76+
def cast_stats(stats_dict: dict[str, Any]) -> dict[str, float]:
77+
"""Normalize all floats to JSON serializable floats."""
78+
return {k: float(v) for k, v in stats_dict.items()}
79+
80+
81+
def parse_grid(grid: Grid) -> dict[str, dict[str, int | str]]:
82+
"""Extract grid information per dimension."""
83+
grid_dict = {}
84+
for dim_name in grid.dim_names:
85+
dim = grid.select_dim(dim_name)
86+
min_ = str(dim.coords[0])
87+
max_ = str(dim.coords[-1])
88+
size = str(dim.coords.shape[0])
89+
grid_dict[dim_name] = {"name": dim_name, "min": min_, "max": max_, "size": size}
90+
return grid_dict
91+
92+
93+
def json_print(mdio_info: dict[str, Any]) -> None:
94+
"""Convert MDIO Info to JSON and pretty print."""
95+
from json import dumps as json_dumps
96+
97+
from rich import print
98+
99+
print(json_dumps(mdio_info, indent=2))
100+
101+
102+
def pretty_print(mdio_info: dict[str, Any]) -> None:
103+
"""Print pretty MDIO Info table to console."""
104+
from rich.console import Console
105+
from rich.table import Table
106+
107+
console = Console()
108+
109+
grid_table = Table(show_edge=False)
110+
grid_table.add_column("Dimension", justify="right", style="cyan", no_wrap=True)
111+
grid_table.add_column("Min", justify="left", style="magenta")
112+
grid_table.add_column("Max", justify="left", style="magenta")
113+
grid_table.add_column("Size", justify="left", style="green")
114+
115+
for _, axis_dict in mdio_info["grid"].items():
116+
name, min_, max_, size = axis_dict.values()
117+
grid_table.add_row(name, min_, max_, size)
118+
119+
stat_table = Table(show_edge=False)
120+
stat_table.add_column("Stat", justify="right", style="cyan", no_wrap=True)
121+
stat_table.add_column("Value", justify="left", style="magenta")
122+
123+
for stat, value in mdio_info["stats"].items():
124+
stat_table.add_row(stat, f"{value:.4f}")
125+
126+
master_table = Table(title=f"File Information for {mdio_info['path']}")
127+
master_table.add_column("MDIO Grid", justify="center")
128+
master_table.add_column("MDIO Statistics", justify="center")
129+
master_table.add_row(grid_table, stat_table)
130+
131+
console.print(master_table)
132+
133+
134+
cli = info

0 commit comments

Comments
 (0)