EAMXX: Automatic Differentiation support

After talking with @bartgol at the EESM meeting last week, he suggested making a tracking issue to discuss changes that would be useful or necessary for differentiable modeling, specifically with automatic differentiation (AD). So, I'm going to make a list of needs for forward- and reverse-mode differentiation at different levels, for future reference, and then a list of applications of AD and the capabilities that I think that those applications would need.

Forward-mode SCM:

- **Sufficiently general array/variable types in the physics.** This is already a feature of EAMxx in general (and I have undying gratitude for all you developers who had the foresight to do this), but hasn't been tested in this particular context. In particular, I'm not sure what changes might need to be made to the field manager to carry around usable information about derivatives.
- **Sufficiently smooth physics.** I actually think this is a pretty minor issue, because most of the physics is Lipschitz continuous, and piecewise $C^1$ continuous. Even most limiters and phase changes in the model obey this. In math terms, these conditions are good enough for the derivative to be defined everywhere except for a set of measure zero, and in practical terms, this is good enough for every neural network with ReLU activation that has ever been trained with backpropagation. Arguably, we should consider a violation of these conditions to be a bug in need of fixing, rather than a feature to be accommodated. (The Bergeron process, deep convective triggering, and any future stochastic physics may be worth keeping an eye on, though, since they may be exceptions.)
- **Linking to and using AD libraries.** Although there are multiple approaches to AD, EAMxx is a C++/Kokkos code aiming for performance portability, which means that Sacado (or some very similar template library compatible with Kokkos) would be the obvious way to make this work.
- **Application-dependent inputs/outputs.** For any given application of AD, we need to actually specify the perturbable variables that the model is sensitive to, and write out whatever information that application needs.

Forward-mode DP-SCREAM or global EAMxx with an idealized surface (e.g. aquaplanet):

- **Dycore support.** This will be much less "automatic" than the physics, because AD for a full 3D system really requires some kind of sparse representation(/approximation) of the Jacobian for each step, as well as additional or modified MPI calls to communicate this additional information.
- **Nudging/piggybacking.** While chaotic behavior is often manageable in short SCM runs, especially with simple or idealized physics, longer runs with LES or cloud-resolving simulations will quickly produce unpredictable behavior, due to improved resolution of turbulence and other chaotic phenomena. This causes derivatives to explode in magnitude. One way of working around this is with "specified dynamics", i.e. by forcing the bulk fluid dynamics of perturbed simulations to follow a particular trajectory. This can be accomplished by various methods, e.g. by choosing a control run of a model to nudge the dynamics towards, by "piggybacking" a perturbed scheme on another model, or via data assimilation methods.

Forward-mode E3SM with real surface models:

- **Coupler/driver support.** This is probably not too bad in that the required changes are probably scientifically and mathematically straightforward, but the software engineering may take some time.
- **Ocean/sea ice/land/land ice support.** OMEGA will be in C++, Icepack may be translated to C++ soon, and MALI already has access to some AD capability through Trilinos/Sacado. However, a C++ implementation does not guarantee that AD is easy, and we will be dealing with Fortran components for some time anyway. Some method will be needed for propagating uncertainties through every part of the code (particularly the land model), either using AD tools for Fortran or by some clever method of working around non-AD-capable code.

Reverse-mode:

- **Extensive support for using "abstract" data types in place of floating-point types.** Reverse-mode AD requires a "forward" pass through the model to be followed by a reverse pass that executes adjoint versions of each operation in the reverse order. Therefore, every operation in the model must produce an abstract type occupying a node of a graph representing the flow of information through the model, rather than a simple floating point value.
- **Frequent Checkpointing/"Rewindability".** The above process requires far too much data to be kept in memory per time step. Therefore, a tractable implementation requires frequent checkpoints, and for the model to be able to be rewound to an earlier state and run again from that state.

Applications:

- **Numerics.** Many diagnostics used for estimating numerical error only require derivatives as diagnostic quantities, calculated for one particular snapshot. Implicit methods only require derivatives to be calculated and stored during the integration of the specific process(es) using an implicit time integration method. Thus most numerical applications thus only require the capabilities listed above for SCM, or an even more restricted subset of these capabilities.
- **Parameter estimation.** Finding the optimal values of a set of parameters (to conform to some constraining data set) can occur on multiple levels. For some microphysical processes (and maybe some SHOC parameters) the SCM may be sufficient, but a 3D model like the DP configuration would likely be more convincing for other uses.
- **Uncertainty quantification.** Derivative information can be useful for sampling methods such as Hamiltonian Monte Carlo. In practice, such methods are expensive enough that they are only tractable for SCM, or for cheap emulators of more expensive models.
- **Sensitivity analysis.** Various forms of sensitivity analysis can be performed on models as simple as SCM or as complex as the coupled global model. One example comes from studies of the "pattern effect", where the SST is perturbed for a single patch of ocean, and the effect on global climate is examined. Such studies often involve dozens of simulations, perturbing each of dozens or hundreds of patches of ocean one-by-one, and running for long enough to distinguish the effect of the perturbation from the internal variability of the climate system. Careful use of forward-mode AD could drastically cut down on the resources required by substituting one run with many derivatives for dozens of independent runs. Reverse mode would be appropriate when the number of ocean patches is very large.
- **Data assimilation/reanalysis.** For traditional data assimilation, which mainly seeks to constrain the model state, methods like 4DVar require a tangent linear and adjoint model for effective forecast/hindcast capabilities, and these are enabled by AD. Additionally, many data assimilation methods have uses in constraining parameters and estimating parametric uncertainty.
- **Hybrid physics-based/ML models.** Pure ML models for climate have low utility, because training a model to extrapolate to conditions beyond its training data is an unsolved problem, and we only have a single physical Earth from which to draw observations. This means that a physics-based model must be run for long periods, many times, to provide training data that covers a wide array of possible climates for an ML model. (To editorialize a bit, I not only think that this is an unsolved problem, but also that the last few decades of AI/ML advances have not produced much progress toward solving it!) However, ML may be more effective for representing specific processes within the climate system, particularly when there is reason to believe that such processes are unlikely to vary much in response to climate perturbations, and this has led to more interest in hybrid systems that combine physically-based and ML parameterizations. In-the-loop training of such ML models generally requires reverse-mode AD for backpropagation (at least for ML models with many parameters, like most neural networks.).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EAMXX: Automatic Differentiation support #6809

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EAMXX: Automatic Differentiation support #6809

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions