|
24 | 24 |
|
25 | 25 | This is the Software Design Description (SDD) document for FloPy 4, also called *the product*. |
26 | 26 |
|
27 | | -This document describes a tentative design, focusing on "core" functional requirements. Some attention may be given to architecture, but non-functional requirements are largely out of scope here. |
| 27 | +This document describes a tentative design, focusing on functional requirements. Some attention may be given to architecture, but non-functional requirements are largely out of scope. |
28 | 28 |
|
29 | 29 | ## Conceptual model |
30 | 30 |
|
31 | 31 | This document follows MODFLOW 6 terminology where applicable, with modifications/translations where appropriate. |
32 | 32 |
|
33 | | -MODFLOW 6 is designed as a hierarchy of modular **components**. |
| 33 | +A MODFLOW 6 simulation is as a hierarchy of modular **components**. Components encapsulate related data and functionality. |
34 | 34 |
|
35 | | -Components encapsulate related functionality and data. Components may have user-specified configuration and/or data **variables**. |
| 35 | +Components may have zero or more user-specified **variables** — we use this term interchangeably with **field**, with the latter preferred due to "variable"'s genericity. A field might be a model parameter, e.g. a numeric scalar or array value. Fields which configure non-numerical features of the simulation are called **options**. A field can be required or optional. |
36 | 36 |
|
37 | | -Most components must have one particular parent (e.g., models are children of a simulation), but some relax this requirement. |
38 | | - |
39 | | -Component types include: |
| 37 | +Components come in several subtypes: |
40 | 38 |
|
41 | | -- **simulation**: MF6's "unit of work", consisting of 1+ (possibly coupled) hydrologic processes |
| 39 | +- **simulation**: the fundamental "unit of work" in MF6, consisting of 1+ (possibly coupled) hydrologic process(es) |
42 | 40 | - **model**: a simulated hydrological process |
43 | 41 | - **package**: a subcomponent of a model or simulation |
44 | 42 |
|
45 | | -Certain subsets of packages have distinguishing characteristics. A **stress package** represents a forcing. A **basic package** contains only input variables applying statically to the entire simulation. An **advanced package** contains time-variable (i.e. transient) input data. Usually only a single instance of a package is expected — when arbitrarily many are permitted, the package is called a **multi-package**. A **subpackage** is a concept only recognized by the product, not by MODFLOW 6 — a package linked to its parent not by a separate input file, but directly (i.e., subpackage data provided to the parent's initializer method). Subpackages may be attached to packages, models, or simulations. |
| 43 | +The simulation is the root of the tree, with models and packages under it, each of which itself might have other packages. |
| 44 | + |
| 45 | +Most components must have one particular parent (e.g., models are children of a simulation), but some relax this requirement. |
| 46 | + |
| 47 | +Packages come in several flavors, not necessarily mutually exclusive. |
| 48 | + |
| 49 | +- A **stress package** represents a forcing. |
| 50 | +- A **basic package** contains only input variables applying statically to the entire simulation. |
| 51 | +- An **advanced package** contains time-varying input variables. |
| 52 | +- Most packages are singular — the parent component may have one and only one instance. When arbitrarily many are permitted, the package is called a **multi-package**. |
| 53 | +- A **subpackage** is a package whose parent is another package. |
46 | 54 |
|
47 | 55 | ```mermaid |
48 | 56 | classDiagram |
49 | 57 | Simulation *-- "1+" Package |
50 | 58 | Simulation *-- "1+" Model |
51 | 59 | Simulation *-- "1+" Variable |
52 | | - Simulation *-- "1+" Subpackage |
53 | 60 | Model *-- "1+" Package |
54 | | - Model *-- "1+" Subpackage |
55 | 61 | Model *-- "1+" Variable |
56 | 62 | Package *-- "1+" Subpackage |
57 | 63 | Package *-- "1+" Variable |
58 | 64 | Subpackage *-- "1+" Variable |
59 | 65 | ``` |
60 | 66 |
|
61 | | -Components are specified by **definition files**. A **definition** specifies input variables for a single MF6 component. A **block** is a named collection of input variables. A definition file specifies exactly one component. A component may contain zero or more blocks. Each block must contain at least one variable. |
| 67 | +Components are specified by **definition files**. A definition file specifies a single component and its fields. A definition file consists of top-level metadata and a collection of **blocks**. A block is a named collection of fields. A component may contain zero or more blocks. Each block must contain at least one variable. Most components will have a block named "Options" — see the [MODFLOW 6 DFN file specification](https://modflow6.readthedocs.io/en/latest/_dev/dfn.html) for more info. |
62 | 68 |
|
63 | 69 | ## Object model |
64 | 70 |
|
65 | | -The product's main use cases will include creating, manipulating, running, and inspecting MODFLOW 6 simulations (FUNC-3, FUNC-4). It is natural to provide an object-oriented interface in which every MF6 component module will generally have a corresponding class. |
66 | | - |
67 | | -Component classes will provide access to both **specification** and **data** — that is, to **form** and **content**, respectively. It should be straightforward to read a component's specification off its class definition, or to inspect it programmatically (FUNC-21). Likewise it should be easy to retrieve the value of a variable from an instance of a component (FUNC-4). |
68 | | - |
69 | | -There are many ways to implement an object model in Python: dictionaries, named tuples, plain classes, `dataclasses`, etc. There are fewer options if the object model must be self-describing. |
70 | | - |
71 | | -### `attrs` |
72 | | - |
73 | | -`dataclasses` are derived from an older project called [`attrs`](https://www.attrs.org/en/stable/) which has [extra powers](https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/). |
74 | | - |
75 | | -Our first proof of concept demonstrated a nested hierarchy of hand-rolled classes forming a component tree. Each component stored its data in-house. The specification was attached via metaclass magic. |
76 | | - |
77 | | -We propose to follow the same general pattern, using `attrs` instead for introspection and data access. |
78 | | - |
79 | | -### `xarray` |
80 | | - |
81 | | -XArray provides abstractions for working with multiple datasets related in a hierarchical context, some of which may share the same spatial/temporal indices and/or coordinate systems. |
82 | | - |
83 | | -We propose to follow [imod-python](https://github.com/Deltares/imod-python) in adopting [xarray](https://docs.xarray.dev/en/stable/index.html) as our components' onboard data store. |
84 | | - |
85 | | -### `attrs` + `xarray` |
86 | | - |
87 | | -Combining these patterns naively would result in several challenges, involving duplication, synchronization, and a more general problem reminiscent of [object-relational impedance mismatch](https://en.wikipedia.org/wiki/Object%E2%80%93relational_impedance_mismatch), where the list-oriented and array-oriented paradigms conflict. |
88 | | - |
89 | | -Ultimately, we'd like a mapping between an abstract hierarchy of components and variables, as defined in MF6 definition files, to a Python representation which is self-describing (courtesy of `attrs`) and self-aligning (courtesy of `xarray`). |
90 | | - |
91 | | -The [`DataTree`](https://docs.xarray.dev/en/stable/generated/xarray.DataTree.html) is a recently developed `xarray` extension, now in the core package, which provides [many of the features we want from a hierarchical data store](https://docs.xarray.dev/en/stable/user-guide/hierarchical-data.html). |
92 | | - |
93 | | -## Data types |
94 | | - |
95 | | -MODFLOW 6 defines a [type system for input variables](https://github.com/MODFLOW-USGS/modflow6/tree/develop/doc/mf6io/mf6ivar#variable-types). We adapt this for Python. |
96 | | - |
97 | | -A variable is either a scalar or a composite. |
98 | | - |
99 | | -Scalars include integer, double precision, boolean, string, and path types. The product can represent scalars as builtin Python primitives. |
| 71 | +The product's main use cases will include creating, manipulating, running, and inspecting MODFLOW 6 simulations. It is natural to provide an object-oriented interface in which every MF6 component module will generally have a corresponding class. |
100 | 72 |
|
101 | | -Composites include array, list, product (record), and sum (union) types. |
| 73 | +There are many ways to implement an object model in Python: dictionaries, named tuples, plain classes, `dataclasses`, etc. |
102 | 74 |
|
103 | | -Translating from MF6: |
| 75 | +Two requirements in particular motivate the design described below: 1) the object model is a tree, and 2) it must be self-describing. |
104 | 76 |
|
105 | | -- A "keystring" is a union. |
106 | | -- A "recarray" is a list. |
| 77 | +Component classes must provide access to both **specification** and **data** — form and content, respectively. A component's specification should be legible from its class definition, to people and programs. |
107 | 78 |
|
108 | | -MF6 places some constraints on composite variables. These are explained below. |
| 79 | +Moreover, MODFLOW 6 components are situated in a hierarchy, with the simulation at the root, a branch for each model, and so on for packages, etc. This is true of both specification and data — the specification tree defines how components may be connected together, while a simulation instantiates some subset of the specification. |
109 | 80 |
|
110 | | -### Records |
| 81 | +A third motivation is consistency with [`imod-python`](https://github.com/Deltares/imod-python), which the product follows in several ways including: |
111 | 82 |
|
112 | | -MODFLOW 6 requires that records contain only scalar fields. A record may not contain another record. |
| 83 | +- Using [`xarray`](https://docs.xarray.dev/en/stable/index.html) for the underlying data model |
| 84 | +- Providing dictionary-style access and modification |
113 | 85 |
|
114 | | -In MF6 input files, a record appears as a whitespace-delimited line of text. While in principle a record's fields are named, MF6 may or may not expect particular values to be "tagged" (as indicated in definition files). "Tagging" is a concern of the product's MF6 IO layer, not the core object model. |
| 86 | +Components in `imod-python` encode parent/child relations in a dictionary, which is filtered as needed for subcomponents of a particular type. The structure of a simulation (or of any component with respect to its children) is thus flexible. "Structural" checks (i.e., what may be attached to what?) run in a separate validation step. |
115 | 87 |
|
116 | | -We expect the product to represent records as full-fledged `attrs` classes. |
| 88 | +The product aims instead for typed components, where children can be read off the class definition. This pulls structural validation from runtime to type-checking time, so invalid arrangements are visible in e.g. IDEs with Intellisense. |
117 | 89 |
|
118 | | -### Unions |
| 90 | +The product adopts the standard library `dataclasses` paradigm for class definitions. The `dataclasses` module is derived from a project called [`attrs`](https://www.attrs.org/en/stable/) with [more power](https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/). `attrs` permits terse class definitions, e.g. |
119 | 91 |
|
120 | | -MODFLOW 6 requires that unions contain only records. Unions may not contain scalars directly. A union may not contain another union. |
121 | | - |
122 | | -To represent unions, the product can simply use `typing.Union`. |
123 | | - |
124 | | -### Arrays |
125 | | - |
126 | | -MODFLOW 6 supports N-dimensional arrays of homogeneous (scalar) type, where 1 <= N <= 3. |
127 | | - |
128 | | -We can accept any `numpy.typing.ArrayLike` value, whether a standard `ndarray` or some other flavor (i.e. "duck arrays"). A common case will be lazy (e.g. dask) arrays for larger-than-memory operations. We can [implement custom array-likes](https://numpy.org/doc/stable/user/basics.interoperability.html) if there is a good case for it. |
129 | | - |
130 | | -Arrays can be type hinted in full detail in component classes, e.g. `NDArray[np.floating]`, while methods can generally have more lenient type hints (e.g. `ArrayLike`) and perform any necessary type checks at runtime. |
131 | | - |
132 | | -### Lists |
133 | | - |
134 | | -MODFLOW 6 lists may contain records or unions of records. Lists may not contain raw scalars. Collections of scalars should be provided as arrays. |
135 | | - |
136 | | -A list of records is regular, i.e. tabular. A list of unions can be irregular (i.e. rows can have different element counts) and cannot be treated as tabular data. |
137 | | - |
138 | | -For instance: |
139 | | - |
140 | | -- A `packagedata` block is typically a list of records of a single type (thus regular/tabular). |
141 | | -- A `period` block is typically a list of unions, where each item may be a different record type (thus irregular). |
142 | | - |
143 | | -The product can accept regular list data as Python builtin collections, NumPy arrays of dtype `np.object_`, `xarray.DataArray` or other duck arrays, or tabular data structures, e.g. `np.recarray`, `pd.DataFrame`. |
144 | | - |
145 | | -**Note**: If storing a regular list in a tabular data structure, the product should avoid columns of dtype `np.object_` — e.g. prefer to store grid cell indices as separate columns `i`, `j`, `k`, not a single column. |
146 | | - |
147 | | -The product can accept irregular lists as builtin collections, NumPy arrays of dtype `np.object_`, or `xarray.DataArray` or other duck arrays. |
148 | | - |
149 | | -## Developer workflow |
150 | | - |
151 | | -The product is a core element in day-to-day MODFLOW 6 development. Most critically, the product must be able to generate a MODFLOW 6 interface layer from a specification (FUNC-20). |
152 | | - |
153 | | -Typically, a MODFLOW 6 developer will write a new component specification and module in MODFLOW 6, run the product's code-generation utilities, and use the regenerated MODFLOW 6 interface layer to write integration tests for the ew component. |
154 | | - |
155 | | - |
156 | | - |
157 | | -```mermaid |
158 | | -C4Container |
159 | | - title [Containers] Code generation workflow |
160 | | -
|
161 | | - Boundary(mf6, "MODFLOW 6"){ |
162 | | - SystemDb(dfn, "Specification") |
163 | | - } |
164 | | -
|
165 | | - Boundary(flopy, "FloPy") { |
166 | | - Boundary(devs, "Developer APIs") { |
167 | | - System(fpycore, "Core framework") |
168 | | - System(fpycodegen, "Code generation") |
169 | | - } |
170 | | - Boundary(users, "User APIs") { |
171 | | - System(fpymf6, "MF6 module") |
172 | | - } |
173 | | - Rel(fpymf6, fpycore, "imports") |
174 | | - |
175 | | - Rel(fpycodegen, dfn, "inspects") |
176 | | - Rel(fpycodegen, fpymf6, "generates") |
177 | | - } |
178 | | -
|
179 | | - Person(dev, "Developer", "") |
180 | | - Person(user, "User", "") |
181 | | -
|
182 | | - Rel(dev, dfn, "develops") |
183 | | - Rel(dev, fpycore, "develops") |
184 | | - Rel(dev, fpycodegen, "develops/uses") |
185 | | - Rel(user, fpymf6, "uses") |
186 | | - UpdateRelStyle(dev, dfn, $lineColor="blue", $offsetX="-20" $offsetY="-30") |
187 | | - UpdateRelStyle(dev, fpycore, $lineColor="blue", $offsetY="90") |
188 | | - UpdateRelStyle(dev, fpycodegen, $lineColor="blue", $offsetY="50") |
189 | | - UpdateRelStyle(user, fpymf6, $lineColor="blue", $offsetY="50") |
190 | | - UpdateRelStyle(user, fpycore, $lineColor="blue", $offsetX="-20" $offsetY="-10") |
| 92 | +```python |
| 93 | +from flopy.mf6.gwf import Ic |
| 94 | +from attrs import define, field |
| 95 | +from numpy.typing import NDArray |
| 96 | +import numpy as np |
191 | 97 |
|
| 98 | +@define |
| 99 | +class Ic(Package): |
| 100 | + """Initial conditions package""" |
| 101 | + strt: NDArray[np.floating] = field(...) |
| 102 | + export_array_ascii: bool = field(...) |
| 103 | + export_array_netcdf: bool = field(...) |
192 | 104 | ``` |
193 | 105 |
|
194 | | -From the MODFLOW 6 developer's perspective, the product's code generation workflow will remain more or less unchanged. |
| 106 | +Minimal class definitions are easier to read and to generate from definition files. The trick is in mapping the MODFLOW 6 input specification to the Python type system. With this transformation defined, the original specification can be derived in reverse from the class definition. |
195 | 107 |
|
196 | | -We propose a few changes to the underlying implementation: |
| 108 | +The product bolts on dictionary-style behavior by implementing `MutableMapping` in a component base class. |
197 | 109 |
|
198 | | -1. Use `Jinja2` for code generation |
199 | | -2. Keep the specification in MODFLOW 6 only |
200 | | -3. Distribute the specification with MODFLOW 6 |
| 110 | +Where `imod-python` components expose their fields via [a `Dataset`](https://github.com/Deltares/imod-python/blob/master/imod/common/interfaces/ipackagebase.py), components in the product expose a `DataTree` node. The [`DataTree`](https://docs.xarray.dev/en/stable/generated/xarray.DataTree.html) is a recently developed `xarray` feature implementing [a hierarchical data store](https://docs.xarray.dev/en/stable/user-guide/hierarchical-data.html). Components in the product are an [experimental hybrid](https://github.com/modflowpy/xattree) of `attrs` and `xarray` where `attrs` properties, as well as parent/child references, are proxied through the `DataTree`. |
201 | 111 |
|
202 | | -Item 1 is an implementation detail. |
| 112 | +Combining `attrs` and `xarray` in this way presents challenges involving duplication (`xarray` prefers copies to in-place updates) and synchronization. Some careful management of parent/child links is still required, even though `DataTree` does the majority of the work. |
203 | 113 |
|
204 | | -Item 2 will deduplicate the MODFLOW 6 specification and reduce the maintenance/synchronization burden for the product's developers. |
205 | | - |
206 | | -Item 3 will allow the product to generate a corresponding MODFLOW 6 interface layer when a new MF6 executable is installed. |
| 114 | +The sparse, record-based list input format used by MODFLOW 6 is also in some tension with `xarray`, where it is natural to disaggregate tables into an array for each constituent column — this requires a nontrivial mapping between data as read from input files and the values eventually accessible through `xarray` APIs. |
207 | 115 |
|
208 | 116 | ## IO |
209 | 117 |
|
210 | | -TODO |
211 | | - |
212 | | -### Reading input files |
| 118 | +IO is at the boundary of the product. Details of any particular input or output format should not contaminate the product's object model. |
213 | 119 |
|
| 120 | +The product provides an IO framework with which de/serializers can be registered for arbitrary components and formats. |
214 | 121 |
|
215 | | -### Writing input files |
| 122 | +The product will allow IO to be configured globally, on a per-simulation basis, or at read/write time via method parameters. |
216 | 123 |
|
| 124 | +IO is implemented in several layers: |
217 | 125 |
|
218 | | -### Reading output files |
| 126 | +- IO operations, implemented as descriptors, backing `load` and `write` methods on the base component class |
| 127 | +- `cattrs` converters to map the object model to/from Python primitives and containers (i.e. un/structuring) |
| 128 | +- Encoders/decoders for any number of serialization formats, which translate primitives/containers to strings |
219 | 129 |
|
| 130 | +In particular, the product will implement a conversion layer and a serialization layer for the MODFLOW 6 input file format. The serialization layer implements a file writer via `Jinja2` templates and a file parser via a `lark` parser generated from an EBNF language specification. |
0 commit comments