Skip to content

Commit 16735fd

Browse files
authored
update srs and sdd (#159)
update some things that had become outdated, and reorganize a bit
1 parent 86817d3 commit 16735fd

File tree

2 files changed

+91
-141
lines changed

2 files changed

+91
-141
lines changed

docs/dev/sdd.md

Lines changed: 52 additions & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -24,196 +24,107 @@
2424

2525
This is the Software Design Description (SDD) document for FloPy 4, also called *the product*.
2626

27-
This document describes a tentative design, focusing on "core" functional requirements. Some attention may be given to architecture, but non-functional requirements are largely out of scope here.
27+
This document describes a tentative design, focusing on functional requirements. Some attention may be given to architecture, but non-functional requirements are largely out of scope.
2828

2929
## Conceptual model
3030

3131
This document follows MODFLOW 6 terminology where applicable, with modifications/translations where appropriate.
3232

33-
MODFLOW 6 is designed as a hierarchy of modular **components**.
33+
A MODFLOW 6 simulation is as a hierarchy of modular **components**. Components encapsulate related data and functionality.
3434

35-
Components encapsulate related functionality and data. Components may have user-specified configuration and/or data **variables**.
35+
Components may have zero or more user-specified **variables** — we use this term interchangeably with **field**, with the latter preferred due to "variable"'s genericity. A field might be a model parameter, e.g. a numeric scalar or array value. Fields which configure non-numerical features of the simulation are called **options**. A field can be required or optional.
3636

37-
Most components must have one particular parent (e.g., models are children of a simulation), but some relax this requirement.
38-
39-
Component types include:
37+
Components come in several subtypes:
4038

41-
- **simulation**: MF6's "unit of work", consisting of 1+ (possibly coupled) hydrologic processes
39+
- **simulation**: the fundamental "unit of work" in MF6, consisting of 1+ (possibly coupled) hydrologic process(es)
4240
- **model**: a simulated hydrological process
4341
- **package**: a subcomponent of a model or simulation
4442

45-
Certain subsets of packages have distinguishing characteristics. A **stress package** represents a forcing. A **basic package** contains only input variables applying statically to the entire simulation. An **advanced package** contains time-variable (i.e. transient) input data. Usually only a single instance of a package is expected — when arbitrarily many are permitted, the package is called a **multi-package**. A **subpackage** is a concept only recognized by the product, not by MODFLOW 6 — a package linked to its parent not by a separate input file, but directly (i.e., subpackage data provided to the parent's initializer method). Subpackages may be attached to packages, models, or simulations.
43+
The simulation is the root of the tree, with models and packages under it, each of which itself might have other packages.
44+
45+
Most components must have one particular parent (e.g., models are children of a simulation), but some relax this requirement.
46+
47+
Packages come in several flavors, not necessarily mutually exclusive.
48+
49+
- A **stress package** represents a forcing.
50+
- A **basic package** contains only input variables applying statically to the entire simulation.
51+
- An **advanced package** contains time-varying input variables.
52+
- Most packages are singular — the parent component may have one and only one instance. When arbitrarily many are permitted, the package is called a **multi-package**.
53+
- A **subpackage** is a package whose parent is another package.
4654

4755
```mermaid
4856
classDiagram
4957
Simulation *-- "1+" Package
5058
Simulation *-- "1+" Model
5159
Simulation *-- "1+" Variable
52-
Simulation *-- "1+" Subpackage
5360
Model *-- "1+" Package
54-
Model *-- "1+" Subpackage
5561
Model *-- "1+" Variable
5662
Package *-- "1+" Subpackage
5763
Package *-- "1+" Variable
5864
Subpackage *-- "1+" Variable
5965
```
6066

61-
Components are specified by **definition files**. A **definition** specifies input variables for a single MF6 component. A **block** is a named collection of input variables. A definition file specifies exactly one component. A component may contain zero or more blocks. Each block must contain at least one variable.
67+
Components are specified by **definition files**. A definition file specifies a single component and its fields. A definition file consists of top-level metadata and a collection of **blocks**. A block is a named collection of fields. A component may contain zero or more blocks. Each block must contain at least one variable. Most components will have a block named "Options" — see the [MODFLOW 6 DFN file specification](https://modflow6.readthedocs.io/en/latest/_dev/dfn.html) for more info.
6268

6369
## Object model
6470

65-
The product's main use cases will include creating, manipulating, running, and inspecting MODFLOW 6 simulations (FUNC-3, FUNC-4). It is natural to provide an object-oriented interface in which every MF6 component module will generally have a corresponding class.
66-
67-
Component classes will provide access to both **specification** and **data** — that is, to **form** and **content**, respectively. It should be straightforward to read a component's specification off its class definition, or to inspect it programmatically (FUNC-21). Likewise it should be easy to retrieve the value of a variable from an instance of a component (FUNC-4).
68-
69-
There are many ways to implement an object model in Python: dictionaries, named tuples, plain classes, `dataclasses`, etc. There are fewer options if the object model must be self-describing.
70-
71-
### `attrs`
72-
73-
`dataclasses` are derived from an older project called [`attrs`](https://www.attrs.org/en/stable/) which has [extra powers](https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/).
74-
75-
Our first proof of concept demonstrated a nested hierarchy of hand-rolled classes forming a component tree. Each component stored its data in-house. The specification was attached via metaclass magic.
76-
77-
We propose to follow the same general pattern, using `attrs` instead for introspection and data access.
78-
79-
### `xarray`
80-
81-
XArray provides abstractions for working with multiple datasets related in a hierarchical context, some of which may share the same spatial/temporal indices and/or coordinate systems.
82-
83-
We propose to follow [imod-python](https://github.com/Deltares/imod-python) in adopting [xarray](https://docs.xarray.dev/en/stable/index.html) as our components' onboard data store.
84-
85-
### `attrs` + `xarray`
86-
87-
Combining these patterns naively would result in several challenges, involving duplication, synchronization, and a more general problem reminiscent of [object-relational impedance mismatch](https://en.wikipedia.org/wiki/Object%E2%80%93relational_impedance_mismatch), where the list-oriented and array-oriented paradigms conflict.
88-
89-
Ultimately, we'd like a mapping between an abstract hierarchy of components and variables, as defined in MF6 definition files, to a Python representation which is self-describing (courtesy of `attrs`) and self-aligning (courtesy of `xarray`).
90-
91-
The [`DataTree`](https://docs.xarray.dev/en/stable/generated/xarray.DataTree.html) is a recently developed `xarray` extension, now in the core package, which provides [many of the features we want from a hierarchical data store](https://docs.xarray.dev/en/stable/user-guide/hierarchical-data.html).
92-
93-
## Data types
94-
95-
MODFLOW 6 defines a [type system for input variables](https://github.com/MODFLOW-USGS/modflow6/tree/develop/doc/mf6io/mf6ivar#variable-types). We adapt this for Python.
96-
97-
A variable is either a scalar or a composite.
98-
99-
Scalars include integer, double precision, boolean, string, and path types. The product can represent scalars as builtin Python primitives.
71+
The product's main use cases will include creating, manipulating, running, and inspecting MODFLOW 6 simulations. It is natural to provide an object-oriented interface in which every MF6 component module will generally have a corresponding class.
10072

101-
Composites include array, list, product (record), and sum (union) types.
73+
There are many ways to implement an object model in Python: dictionaries, named tuples, plain classes, `dataclasses`, etc.
10274

103-
Translating from MF6:
75+
Two requirements in particular motivate the design described below: 1) the object model is a tree, and 2) it must be self-describing.
10476

105-
- A "keystring" is a union.
106-
- A "recarray" is a list.
77+
Component classes must provide access to both **specification** and **data** — form and content, respectively. A component's specification should be legible from its class definition, to people and programs.
10778

108-
MF6 places some constraints on composite variables. These are explained below.
79+
Moreover, MODFLOW 6 components are situated in a hierarchy, with the simulation at the root, a branch for each model, and so on for packages, etc. This is true of both specification and data — the specification tree defines how components may be connected together, while a simulation instantiates some subset of the specification.
10980

110-
### Records
81+
A third motivation is consistency with [`imod-python`](https://github.com/Deltares/imod-python), which the product follows in several ways including:
11182

112-
MODFLOW 6 requires that records contain only scalar fields. A record may not contain another record.
83+
- Using [`xarray`](https://docs.xarray.dev/en/stable/index.html) for the underlying data model
84+
- Providing dictionary-style access and modification
11385

114-
In MF6 input files, a record appears as a whitespace-delimited line of text. While in principle a record's fields are named, MF6 may or may not expect particular values to be "tagged" (as indicated in definition files). "Tagging" is a concern of the product's MF6 IO layer, not the core object model.
86+
Components in `imod-python` encode parent/child relations in a dictionary, which is filtered as needed for subcomponents of a particular type. The structure of a simulation (or of any component with respect to its children) is thus flexible. "Structural" checks (i.e., what may be attached to what?) run in a separate validation step.
11587

116-
We expect the product to represent records as full-fledged `attrs` classes.
88+
The product aims instead for typed components, where children can be read off the class definition. This pulls structural validation from runtime to type-checking time, so invalid arrangements are visible in e.g. IDEs with Intellisense.
11789

118-
### Unions
90+
The product adopts the standard library `dataclasses` paradigm for class definitions. The `dataclasses` module is derived from a project called [`attrs`](https://www.attrs.org/en/stable/) with [more power](https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/). `attrs` permits terse class definitions, e.g.
11991

120-
MODFLOW 6 requires that unions contain only records. Unions may not contain scalars directly. A union may not contain another union.
121-
122-
To represent unions, the product can simply use `typing.Union`.
123-
124-
### Arrays
125-
126-
MODFLOW 6 supports N-dimensional arrays of homogeneous (scalar) type, where 1 <= N <= 3.
127-
128-
We can accept any `numpy.typing.ArrayLike` value, whether a standard `ndarray` or some other flavor (i.e. "duck arrays"). A common case will be lazy (e.g. dask) arrays for larger-than-memory operations. We can [implement custom array-likes](https://numpy.org/doc/stable/user/basics.interoperability.html) if there is a good case for it.
129-
130-
Arrays can be type hinted in full detail in component classes, e.g. `NDArray[np.floating]`, while methods can generally have more lenient type hints (e.g. `ArrayLike`) and perform any necessary type checks at runtime.
131-
132-
### Lists
133-
134-
MODFLOW 6 lists may contain records or unions of records. Lists may not contain raw scalars. Collections of scalars should be provided as arrays.
135-
136-
A list of records is regular, i.e. tabular. A list of unions can be irregular (i.e. rows can have different element counts) and cannot be treated as tabular data.
137-
138-
For instance:
139-
140-
- A `packagedata` block is typically a list of records of a single type (thus regular/tabular).
141-
- A `period` block is typically a list of unions, where each item may be a different record type (thus irregular).
142-
143-
The product can accept regular list data as Python builtin collections, NumPy arrays of dtype `np.object_`, `xarray.DataArray` or other duck arrays, or tabular data structures, e.g. `np.recarray`, `pd.DataFrame`.
144-
145-
**Note**: If storing a regular list in a tabular data structure, the product should avoid columns of dtype `np.object_` &mdash; e.g. prefer to store grid cell indices as separate columns `i`, `j`, `k`, not a single column.
146-
147-
The product can accept irregular lists as builtin collections, NumPy arrays of dtype `np.object_`, or `xarray.DataArray` or other duck arrays.
148-
149-
## Developer workflow
150-
151-
The product is a core element in day-to-day MODFLOW 6 development. Most critically, the product must be able to generate a MODFLOW 6 interface layer from a specification (FUNC-20).
152-
153-
Typically, a MODFLOW 6 developer will write a new component specification and module in MODFLOW 6, run the product's code-generation utilities, and use the regenerated MODFLOW 6 interface layer to write integration tests for the ew component.
154-
155-
156-
157-
```mermaid
158-
C4Container
159-
title [Containers] Code generation workflow
160-
161-
Boundary(mf6, "MODFLOW 6"){
162-
SystemDb(dfn, "Specification")
163-
}
164-
165-
Boundary(flopy, "FloPy") {
166-
Boundary(devs, "Developer APIs") {
167-
System(fpycore, "Core framework")
168-
System(fpycodegen, "Code generation")
169-
}
170-
Boundary(users, "User APIs") {
171-
System(fpymf6, "MF6 module")
172-
}
173-
Rel(fpymf6, fpycore, "imports")
174-
175-
Rel(fpycodegen, dfn, "inspects")
176-
Rel(fpycodegen, fpymf6, "generates")
177-
}
178-
179-
Person(dev, "Developer", "")
180-
Person(user, "User", "")
181-
182-
Rel(dev, dfn, "develops")
183-
Rel(dev, fpycore, "develops")
184-
Rel(dev, fpycodegen, "develops/uses")
185-
Rel(user, fpymf6, "uses")
186-
UpdateRelStyle(dev, dfn, $lineColor="blue", $offsetX="-20" $offsetY="-30")
187-
UpdateRelStyle(dev, fpycore, $lineColor="blue", $offsetY="90")
188-
UpdateRelStyle(dev, fpycodegen, $lineColor="blue", $offsetY="50")
189-
UpdateRelStyle(user, fpymf6, $lineColor="blue", $offsetY="50")
190-
UpdateRelStyle(user, fpycore, $lineColor="blue", $offsetX="-20" $offsetY="-10")
92+
```python
93+
from flopy.mf6.gwf import Ic
94+
from attrs import define, field
95+
from numpy.typing import NDArray
96+
import numpy as np
19197

98+
@define
99+
class Ic(Package):
100+
"""Initial conditions package"""
101+
strt: NDArray[np.floating] = field(...)
102+
export_array_ascii: bool = field(...)
103+
export_array_netcdf: bool = field(...)
192104
```
193105

194-
From the MODFLOW 6 developer's perspective, the product's code generation workflow will remain more or less unchanged.
106+
Minimal class definitions are easier to read and to generate from definition files. The trick is in mapping the MODFLOW 6 input specification to the Python type system. With this transformation defined, the original specification can be derived in reverse from the class definition.
195107

196-
We propose a few changes to the underlying implementation:
108+
The product bolts on dictionary-style behavior by implementing `MutableMapping` in a component base class.
197109

198-
1. Use `Jinja2` for code generation
199-
2. Keep the specification in MODFLOW 6 only
200-
3. Distribute the specification with MODFLOW 6
110+
Where `imod-python` components expose their fields via [a `Dataset`](https://github.com/Deltares/imod-python/blob/master/imod/common/interfaces/ipackagebase.py), components in the product expose a `DataTree` node. The [`DataTree`](https://docs.xarray.dev/en/stable/generated/xarray.DataTree.html) is a recently developed `xarray` feature implementing [a hierarchical data store](https://docs.xarray.dev/en/stable/user-guide/hierarchical-data.html). Components in the product are an [experimental hybrid](https://github.com/modflowpy/xattree) of `attrs` and `xarray` where `attrs` properties, as well as parent/child references, are proxied through the `DataTree`.
201111

202-
Item 1 is an implementation detail.
112+
Combining `attrs` and `xarray` in this way presents challenges involving duplication (`xarray` prefers copies to in-place updates) and synchronization. Some careful management of parent/child links is still required, even though `DataTree` does the majority of the work.
203113

204-
Item 2 will deduplicate the MODFLOW 6 specification and reduce the maintenance/synchronization burden for the product's developers.
205-
206-
Item 3 will allow the product to generate a corresponding MODFLOW 6 interface layer when a new MF6 executable is installed.
114+
The sparse, record-based list input format used by MODFLOW 6 is also in some tension with `xarray`, where it is natural to disaggregate tables into an array for each constituent column &mdash; this requires a nontrivial mapping between data as read from input files and the values eventually accessible through `xarray` APIs.
207115

208116
## IO
209117

210-
TODO
211-
212-
### Reading input files
118+
IO is at the boundary of the product. Details of any particular input or output format should not contaminate the product's object model.
213119

120+
The product provides an IO framework with which de/serializers can be registered for arbitrary components and formats.
214121

215-
### Writing input files
122+
The product will allow IO to be configured globally, on a per-simulation basis, or at read/write time via method parameters.
216123

124+
IO is implemented in several layers:
217125

218-
### Reading output files
126+
- IO operations, implemented as descriptors, backing `load` and `write` methods on the base component class
127+
- `cattrs` converters to map the object model to/from Python primitives and containers (i.e. un/structuring)
128+
- Encoders/decoders for any number of serialization formats, which translate primitives/containers to strings
219129

130+
In particular, the product will implement a conversion layer and a serialization layer for the MODFLOW 6 input file format. The serialization layer implements a file writer via `Jinja2` templates and a file parser via a `lark` parser generated from an EBNF language specification.

docs/dev/srs.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@ Other libraries and tools may build upon the product to offer more advanced, dom
6767

6868
- A MODFLOW developer is setting up a worked example to demonstrate how to use a new feature...
6969

70+
- A MODFLOW 6 developer writes a new component specification and corresponding module, generates a compatible Python interface, and uses it to write integration tests for the new component...
71+
7072
```mermaid
7173
C4Context
7274
title [Context] Product use cases
@@ -94,6 +96,43 @@ C4Context
9496
9597
```
9698

99+
```mermaid
100+
C4Context
101+
title [Context] Code generation workflow
102+
103+
Boundary(mf6, "MODFLOW 6"){
104+
SystemDb(dfn, "Specification")
105+
}
106+
107+
Boundary(flopy, "FloPy") {
108+
Boundary(devs, "Developer APIs") {
109+
System(fpycore, "Core framework")
110+
System(fpycodegen, "Code generation")
111+
}
112+
Boundary(users, "User APIs") {
113+
System(fpymf6, "MF6 module")
114+
}
115+
Rel(fpymf6, fpycore, "imports")
116+
117+
Rel(fpycodegen, dfn, "inspects")
118+
Rel(fpycodegen, fpymf6, "generates")
119+
}
120+
121+
Person(dev, "Developer", "")
122+
Person(user, "User", "")
123+
124+
Rel(dev, dfn, "develops")
125+
Rel(dev, fpycore, "develops")
126+
Rel(dev, fpycodegen, "develops/uses")
127+
Rel(user, fpymf6, "uses")
128+
UpdateRelStyle(dev, dfn, $lineColor="blue", $offsetX="-20" $offsetY="-30")
129+
UpdateRelStyle(dev, fpycore, $lineColor="blue", $offsetY="90")
130+
UpdateRelStyle(dev, fpycodegen, $lineColor="blue", $offsetY="50")
131+
UpdateRelStyle(user, fpymf6, $lineColor="blue", $offsetY="50")
132+
UpdateRelStyle(user, fpycore, $lineColor="blue", $offsetX="-20" $offsetY="-10")
133+
134+
```
135+
97136
## Requirements
98137

99138
Broadly, the product should

0 commit comments

Comments
 (0)