Skip to content

Commit 6e6e7cc

Browse files
authored
document io approach (#187)
flesh out the io section in the tentative design doc
1 parent fcc0928 commit 6e6e7cc

File tree

1 file changed

+119
-19
lines changed

1 file changed

+119
-19
lines changed

docs/dev/sdd.md

Lines changed: 119 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,17 @@
33
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
44
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
55

6+
67
- [Conceptual model](#conceptual-model)
78
- [Object model](#object-model)
89
- [IO](#io)
10+
- [Input](#input)
11+
- [Unified IO](#unified-io)
12+
- [Conversion](#conversion)
13+
- [Serialization](#serialization)
14+
- [Writer](#writer)
15+
- [Reader](#reader)
16+
- [Output](#output)
917

1018
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
1119

@@ -19,25 +27,24 @@ This document follows MODFLOW 6 terminology where applicable, with modifications
1927

2028
A MODFLOW 6 simulation is as a hierarchy of modular **components**. Components encapsulate related data and functionality.
2129

22-
Components may have zero or more user-specified **variables** &mdash; we use this term interchangeably with **field**, with the latter preferred due to "variable"'s genericity. A field might be a model parameter, e.g. a numeric scalar or array value. Fields which configure non-numerical features of the simulation are called **options**. A field can be required or optional.
30+
Components may have zero or more user-specified **variables** &mdash; the product calls these **field**, as the latter is more conventional in the Python world. A field might be a numeric parameter, e.g. a scalar or array value, or a configuration value. Fields which configure non-numerical features of the simulation are called **options**. A field may or may not be mandatory.
2331

24-
Components come in several subtypes:
32+
The fundamental component flavors are
2533

26-
- **simulation**: the fundamental "unit of work" in MF6, consisting of 1+ (possibly coupled) hydrologic process(es)
27-
- **model**: a simulated hydrological process
28-
- **package**: a subcomponent of a model or simulation
34+
- **simulation**: MF6's "unit of work", consisting of 1+ models, possibly coupled
35+
- **model**: a simulated hydrological process, possibly coupled to others
36+
- **package**: a subcomponent of a simulation, model, or package
2937

30-
The simulation is the root of the tree, with models and packages under it, each of which itself might have other packages.
38+
The simulation is the root of a tree whose internal nodes are models and whose leaves are packages. A package is not necessarily a leaf; packages may have packages as children.
3139

32-
Most components must have one particular parent (e.g., models are children of a simulation), but some relax this requirement.
40+
Most components have only one possible parent (e.g., models are children of the simulation), but some relax this requirement.
3341

34-
Packages come in several flavors, not necessarily mutually exclusive.
42+
There are several special kinds of package, not necessarily mutually exclusive.
3543

36-
- A **stress package** represents a forcing.
37-
- A **basic package** contains only input variables applying statically to the entire simulation.
38-
- An **advanced package** contains time-varying input variables.
39-
- Most packages are singular &mdash; the parent component may have one and only one instance. When arbitrarily many are permitted, the package is called a **multi-package**.
40-
- A **subpackage** is a package whose parent is another package.
44+
- A **stress package** represents a forcing
45+
- A **basic package** contains only input variables applying statically to the entire simulation
46+
- An **advanced package** contains time-varying input variables
47+
- A **subpackage** is a package whose parent is another package
4148

4249
```mermaid
4350
classDiagram
@@ -51,7 +58,7 @@ classDiagram
5158
Subpackage *-- "1+" Variable
5259
```
5360

54-
Components are specified by **definition files**. A definition file specifies a single component and its fields. A definition file consists of top-level metadata and a collection of **blocks**. A block is a named collection of fields. A component may contain zero or more blocks. Each block must contain at least one variable. Most components will have a block named "Options" &mdash; see the [MODFLOW 6 DFN file specification](https://modflow6.readthedocs.io/en/latest/_dev/dfn.html) for more info.
61+
Components are specified by **definition files**. A definition file specifies a single component and its fields. A definition file consists of top-level metadata and **blocks** (named collections) of variables. A component may contain zero or more blocks. Each block must contain at least one variable. Most components will have a block named "options" &mdash; see the [MODFLOW 6 DFN file specification](https://modflow6.readthedocs.io/en/latest/_dev/dfn.html) for more info.
5562

5663
## Object model
5764

@@ -108,10 +115,103 @@ The product provides an IO framework with which de/serializers can be registered
108115

109116
The product will allow IO to be configured globally, on a per-simulation basis, or at read/write time via method parameters.
110117

111-
IO is implemented in several layers:
118+
### Input
119+
120+
Input file IO is implemented in three layers:
121+
122+
1. **Unified IO layer**: Registry and descriptors implementing `load` and `write` methods on the base `Component` class
123+
2. **Conversion layer**: Uses `cattrs` to map the object model to/from Python primitives and containers (i.e. un/structuring)
124+
3. **Serialization layer**: Format-specific encoders/decoders translating primitives and containers to/from strings or binary data
125+
126+
#### Unified IO
127+
128+
The `flopy4.uio` module provides a pluggable IO framework adapted from [`astropy`](https://github.com/astropy/astropy/tree/main/astropy/io). A global `Registry` maintains mappings from `(component_class, format)` pairs to load and write functions. The `Component` base class implements user-facing `load` and `write` methods via descriptors which dispatch functions in the registry.
129+
130+
Loaders and writers can be registered for any component class and format. The registry supports inheritance: a loader/writer registered for a base class is available to all subclasses.
131+
132+
```python
133+
from flopy4.uio import DEFAULT_REGISTRY
134+
from flopy4.mf6.component import Component
135+
136+
DEFAULT_REGISTRY.register_writer(Component, "ascii", write_ascii)
137+
DEFAULT_REGISTRY.register_writer(Component, "netcdf", write_netcdf)
138+
```
139+
140+
The user may then select a format at call time, e.g. `component.write(format="netcdf")`.
141+
142+
#### Conversion
143+
144+
The conversion layer uses `cattrs` to transform between the product's `xarray`/`attrs`-based object model and plain Python data structures suitable for serialization. This layer is format-agnostic and handles structural transformations common across formats.
145+
146+
**Unstructuring (write path)**: A `cattrs` converter with appropriate hooks will convert components to nested dictionaries organized by block, handling tasks like
147+
148+
- Organizing fields into blocks according to their `block` metadata from DFN files
149+
- Converting child components to binding records for parent component name files
150+
- Sliceing time-varying (period block) arrays by stress period
151+
- Converting `Path` objects to records (`FILEOUT` etc)
152+
153+
**Structuring (read path)**: The reverse transformation turns dictionaries of primitives into component instances. A `cattrs` converter with appropriate hooks will, among other things,
154+
155+
- Instantiate child components from binding records
156+
- Convert sparse list-input representations to arrays
157+
- Reconstruct time-varying array variables from indexed blocks
158+
- Guarantee `xarray` objects have proper dimensions/coordinates
159+
160+
#### Serialization
161+
162+
The serialization layer implements format-specific encoding and decoding. The product minimally aims to implement serializers for the MODFLOW 6 text-based input format and MODFLOW 6 binary output formats.
163+
164+
##### Writer
165+
166+
The writer in `flopy4.mf6.codec.writer` uses [Jinja2](https://jinja.palletsprojects.com/) templates to render unstructured component dictionaries as MF6 input files.
167+
168+
A top-level-template `blocks.jinja` iterates over blocks, calling field macros defined in `macros.jinja`. Macros dispatch on field format (detected via custom Jinja filters) to render:
169+
170+
- **Scalars**: keywords, integers, floats, strings
171+
- **Records**: tuples of values (e.g., file specifications, cell IDs with values)
172+
- **Arrays**: numeric arrays with control records (`CONSTANT`, `INTERNAL`, `OPEN/CLOSE`)
173+
- **Lists**: stress period data, either tabular or keystring format
174+
- **Keystrings**: option records with keyword-value pairs
175+
176+
Custom Jinja filters in `flopy4.mf6.codec.writer.filters` implement field-specific logic.
177+
178+
The writer handles several MF6-specific concerns:
179+
- **Layered arrays**: 3D arrays are chunked by layer for `LAYERED` array input
180+
- **External files**: Large arrays can reference external files via `OPEN/CLOSE`
181+
- **NetCDF output**: Array control records can specify `NETCDF` for array output
182+
- **Fill values**: Sparse data representation elides cells with fill value `DNODATA`
183+
184+
##### Reader
185+
186+
The reader in `flopy4.mf6.codec.reader` uses [Lark](https://lark-parser.readthedocs.io/) to parse MF6 input files. Parsing is implemented in two stages: a parser generates a parse tree from input text, then a transformer converts the tree to Python data structures.
187+
188+
The reader currently provides two grammar/transformer pairs:
189+
190+
**Basic grammar**: A minimal grammar recognizing only the block structure of MF6 input files. Blocks are delimited by `BEGIN <name>` and `END <name>` markers and contain lines of whitespace-separated tokens (words and numbers). The corresponding transformer simply yields blocks as lists of lines, each a list of tokens.
191+
192+
**Typed grammar**: A type-aware grammar with rules for specific MF6 constructs:
193+
- Array control records: `CONSTANT`, `INTERNAL`, `OPEN/CLOSE` with modifiers (`FACTOR`, `IPRN`, `BINARY`)
194+
- Layered arrays: `LAYERED` keyword preceding multiple array control records
195+
- NetCDF arrays: `NETCDF` keyword
196+
- Numeric types: integers and doubles
197+
- Strings: quoted strings and bare words
198+
- Lists and records: whitespace-delimited values
199+
200+
A grammar inheriting from and using the typed base grammar can then be generated for each component.
201+
202+
A typed transformer can use the DFN specification to identify fields by keyword, and can handle data types properly, for instance creating `xarray.DataArray` objects for array fields and handling external file references.
203+
204+
This "push knowledge into the parser" approach
205+
206+
- creates more structured parse trees
207+
- reduces post-parsing transformation complexity
208+
- speeds up validation
209+
- generates better error messages
210+
211+
After parsing and transformation, a `cattrs` converter structures the resulting dicts into components.
212+
213+
### Output
112214

113-
- IO operations, implemented as descriptors, backing `load` and `write` methods on the base component class
114-
- `cattrs` converters to map the object model to/from Python primitives and containers (i.e. un/structuring)
115-
- Encoders/decoders for any number of serialization formats, which translate primitives/containers to strings
215+
Binary output readers are provided for binary head and budget output files.
116216

117-
In particular, the product will implement a conversion layer and a serialization layer for the MODFLOW 6 input file format. The serialization layer implements a file writer via `Jinja2` templates and a file parser via a `lark` parser generated from an EBNF language specification.
217+
These readers parse the binary formats specified in the MODFLOW 6 documentation and return data as `xarray` structures. The approach is largely borrowed from `imod-python`.

0 commit comments

Comments
 (0)