Skip to content

Commit 11c5af1

Browse files
PeterKrausml-evs
andauthored
A new datatractor CLI (#16)
Co-authored-by: Matthew Evans <[email protected]>
1 parent 2803a9d commit 11c5af1

File tree

4 files changed

+371
-95
lines changed

4 files changed

+371
-95
lines changed

README.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,16 +40,17 @@ pip install .
4040

4141
### Usage
4242

43-
Currently, you can use the `extract` function from the `beam` module inside your own Python code:
43+
#### As a Python module
44+
To extract data from a file, you can use the `extract` function from the `beam` module inside your own Python code:
4445

4546
```python
46-
from beam import extract
47+
from datatractor_beam import extract
4748

4849
# extract(<input_type>, <input_path>)
4950
data = extract("./example.mpr", "biologic-mpr")
5051
```
5152

52-
This example will install the first compatible `biologic-mpr` extractor it finds in the registry into a fresh virtualenv (under `./beam-venvs`), and then execute it on the file at `example.mpr`.
53+
This example will install the first extractor that is compatible with the `biologic-mpr` filetype that it finds in the registry. It will be installed into a fresh virtualenv (under `./beam-venvs`), and then executed on the file at `example.mpr`.
5354

5455
By default, the `extract` function will attempt to use the extractor's Python-based invocation (i.e. the optional `preferred_mode="python"` argument is specified). This means the extractor will be executed from within python, and the returned `data` object will be a Python object as defined (and supported) by the extractor. This may require additional packages to be installed, for examples `pandas` or `xarray`, which are both supported via the installation command `pip install .[formats]` above. If you encounter the following traceback, a missing "format" (such as `xarray` here) is the likely reason:
5556

@@ -63,16 +64,25 @@ ModuleNotFoundError: No module named 'xarray'
6364
Alternatively, if the `preferred_mode="cli"` argument is specified, the extractor will be executed using its command-line invocation. This means the output of the extractor will most likely be a file, which can be further specified using the `output_type` argument:
6465

6566
```python
66-
from beam import extract
67+
from datatractor_beam import extract
6768
ret = extract("example.mpr", "biologic-mpr", output_path="output.nc", preferred_mode = "cli")
6869
```
6970

7071
In this case, the `ret` will be empty bytes, and the output of the extractor should appear in the `output.nc` file.
7172

72-
Finally, `beam` can also be executed from the command line, implying `preferred_mode="cli"`. The command line invocation equivalent to the above Python syntax is:
73+
#### As a command line utility
74+
75+
The `datatractor` utility supports the following subcommands:
76+
77+
- `beam`: used to extract data from an input file of a known file type,
78+
- `probe`: used to search the registry for extractors that match a known file type,
79+
- `yard`: used to fetch the definition of an extractor from the registry, and
80+
- `install`: used to install an extractor.
81+
82+
In particular, the `extract()` functionality discussed above can also be executed from the command line, implying `preferred_mode="cli"`. The command line invocation equivalent to the above Python syntax is:
7383

7484
```bash
75-
beam biologic-mpr example.mpr --outfile output.nc
85+
datatractor beam biologic-mpr example.mpr --output-path output.nc
7686
```
7787

7888

0 commit comments

Comments
 (0)