enzo-project · mabruzzo · Jun 13, 2023 · Jun 14, 2023 · Jul 3, 2023 · Jul 4, 2023
diff --git a/doc/source/tests/checkpoint.rst b/doc/source/tests/checkpoint.rst
@@ -1,37 +1,111 @@
+.. _checkpoint-tests:
+
 ----------------
 Checkpoint Tests
 ----------------
 
-Tests for restart checkpoints. Runs 2D implosion using PPM method with slope mesh refinement. 
+At the time of writing, there are currently two mechanisms for checkpointing:
+
+1. The new approach that writes checkpoints with the ``"check"`` method, that promises a lot more flexibility.
+2. The legacy approach that uses Charm++'s machinery
+
+The intention is to entirely transition to the new approach. But, at this time the new approach has limitations (see `Issue #316 <https://github.com/enzo-project/enzo-e/issues/316>`_). For that reason, both approaches are automatically tested (by the pytest machinery).
+
+The checkpoint-restart tests consist of 4 steps:
+
+1. Perform some setup of the temporary directories where the simulation that creates the checkpoints will be run and a temporary directory where the restarted simulation will be run.
+   This setup may involve the creation of modified parameter files and the creation of symlinks (As is common with many test-cases, the symlinks allow the include-directives in parameter files to function properly - plus they facillitate the finding of the appropriate grackle data files).
+
+2. Execute the checkpoint-run (this starts from a parameter file and creates checkpoints).
+
+3. Execute the restart-run (this starts a simulation from the one of the checkpoints)
+
+4. Compare the outputs of the two different runs.
+   Currently, we check that outputs are bitwise identical (**aside:** this may be a little limitting for functionality involving reductions like gravity methods)
+
+The testing machinery has two major limitations:
+
+1. it generally requires the usage of incomplete parameter files (the test machinery fills in some missing information as it goes)
+2. it makes enforces different assumptions about what should be contained inside of a parameter file when testing the scalable-checkpoint approach and when testing the older Charm++-approach.
+
+The first limitation can be somewhat mitigated through the use of our `tools/ckpt_restart_test.py` script.
+
+* This script implements all of the logic from our checkpoint-restart testing machinery, but can be used outside of the pytest machinery.
+  It primarily exists to help debug cases where checkpoint-restart breaks.
+  It can also be used as a sanity check for whether the checkpoint-restart functionality works properly when using an arbitrary set of methods.
+
+* For example, to run the test for the new-style machinery, you should invoke the following from the base directory of the repository:
+
+  .. code-block:: bash
+
+      python3 tools/ckpt_restart_test.py \
+          --input input/Checkpoint/checkpoint_ppm.in \
+          --enzoe <path/to/enzo-e> \
+          --charm <path/to/charmrun> \
+          --stop_cycle 5 \
+          --symlink input \
+          --grackle-input-data-dir <path/to/grackle/input/data/dir> \
+          --legacy-output \
+          --test-dir my_test
+
+  The command is very similar to launch the test of checkpoints that use charm++ machinery:
+
+  .. code-block:: bash
+
+      python3 tools/ckpt_restart_test.py \
+          --input input/Checkpoint/legacy/checkpoint_ppm.in \
+          --enzoe <path/to/enzo-e> \
+          --charm <path/to/charmrun> \
+          --stop_cycle 5 \
+          --symlink input \
+          --grackle-input-data-dir <path/to/grackle/input/data/dir> \
+          --legacy-output \
+          --charm-restart \
+          --test-dir my_test
+
+  There are a few things to note about the flags passed to the command:
+
+    * For the particular simulation considered here, the ``--grackle-input-data-dir`` flag is technically unnecessary.
+      It's ONLY necessary when you consider a case that involves Grackle.
 
+    * ``--enzoe`` and ``--charm`` should be passed paths to ``enzo-e`` binary and ``charmrun`` binary, respectively
 
-checkpoint_ppm-1
-================
+    * the value passed to ``--test-dir`` is somewhat arbitrary.
+      It specifies the directory in which the simulations are executed.
+      The script will claim full-ownership over this directory.
+      If the directory already exists (and has any contents), the tests won't run unless the ``--clobber`` flag is passed (in which case, the script will clear ALL contents of that directory before doing anything else).
 
-Tests checkpoint/restart for serial run methods
+* In each case the full parameter file used to launch the initial run (that generates checkpoint-dumps) can be found in ``my_test/ckpt_run/parameters.in`` (the aggregated parameter file produced by that simulation should be located in ``my_test/ckpt_run/parameters.out``).
 
+* For the new-style restarts, the parameter file used to launch the restart can be found in ``my_test/restart_run/parameters.in`` (or ``my_test/restart_run/parameters.out``). For charm-based restarts, there won't be a parameter file.
 
-checkpoint_ppm-8
-================
 
-Tests checkpoint/restart for parallel run methods
+List of the test cases in Framework
+===================================
 
+The following files are used for testing the legacy Checkpoint functionality
 
-checkpoint_boundary
-===================
+* `input/Checkpoint/legacy/checkpoint_boundary.in`
+* `input/Checkpoint/legacy/checkpoint_grackle.in`
+* `input/Checkpoint/legacy/checkpoint_ppm.in`
+* `input/Checkpoint/legacy/checkpoint_vlct.in`
 
-Under construction
+The following files are used for testing the new-style Checkpoint functionality
 
+* `input/Checkpoint/checkpoint_boundary.in`
+* `input/Checkpoint/checkpoint_grackle.in`
+* `input/Checkpoint/checkpoint_ppm.in`
 
-checkpoint_grackle
-==================
+.. note::
 
-Under construction
+    Introduce a test of the new-style checkpointing for a simulation using the VL+CT Method.
 
+.. note::
 
-checkpoint_vlct
-===============
+    This list should get updated as more tests get introduced. It may also be nice to add descriptions
 
-Under construction
 
+Tests outside of the framework
+==============================
 
+The files `input/Checkpoint/test_cosmo-check.in` and `input/Checkpoint/test_cosmo-restart.in` show a sample-cosmology simulation that uses the checkpoint-restart functionality.
diff --git a/doc/source/tests/existing_tests.rst b/doc/source/tests/existing_tests.rst
@@ -62,7 +62,7 @@ Currently, Enzo-e has the following test simulations in the input folder:
    vlct
    others
 
-Existing Answer Tests
+Existing pytest Tests
 =====================
 
 The answer test suite currently covers the following simulations:
@@ -72,3 +72,7 @@ The answer test suite currently covers the following simulations:
 
 .. toctree::
    grackle-pytest
+
+Other pytest-tests cover the following simulations:
+
+* :ref:`checkpoint-tests`
diff --git a/input/Checkpoint/checkpoint_boundary.in b/input/Checkpoint/checkpoint_boundary.in
@@ -1,123 +1,53 @@
-# The basic idea here is to check that the checkpoint capabilities of all
-# boundaries work correctly. Maybe this should be broken into separate tests
-# in the future.
+# Problem: Checkpoint-Restart Boundary-Conditions
+# Author: Matthew Abruzzo
 #
-# The testing tool automatically provides Stopping and Output sections
-
-   Domain {
-      lower = [0.0, 0.0, 0.0];
-      upper = [1.0, 1.0, 1.0];
-   }
-
-   Mesh { 
-      root_rank   = 3;
-      root_size   = [16,16,16];
-      root_blocks = [2,2,2];
+# This is an input file for testing the new-style checkpoint-restart machinery
+#
+# In more detail, this mostly acts like a template-file that is used by the
+# testing tool. In practice, the testing tool will:
+# - provide a Stopping section and an Output section.
+# - use the contents of Field:list to determine which fields to compare before
+#   and after a restart (for new-style checkpoint-restart, the test can be
+#   configured to skip this)
+# Details are provided on the website documentation about how you can see the
+# full parameter files that are generated in this test.
+
+   include "input/Checkpoint/legacy/checkpoint_boundary.in"
+
+   Adapt {
+      max_initial_level = 0;
+      min_level = -1;
+      max_level = 0;
    }
 
+   # we want to check if the time-dependence of a boundary works properly.
+   # it is not supported by the old-style checkpoint-restart infrastructure.
    Boundary {
-      list = ["density_inflow", "total_energy_inflow", "VX_inflow",
-              "VY_inflow", "VZ_inflow", "downwind", "yedge", "zedge"];
-      density_inflow {
-        face = "lower";
-        axis = "x";
-        type = "inflow";
-        field_list = "density";
-        value = 1.0;
-      }
-      total_energy_inflow {
-        face = "lower";
-        axis = "x";
-        type = "inflow";
-        field_list = "total_energy";
-        value = 5.5;
-      }
       VX_inflow {
-        face = "lower";
-        axis = "x";
-        type = "inflow";
-        field_list = "velocity_x";
-        value = 1.0;
-      }
-      VY_inflow {
-        face = "lower";
-        axis = "x";
-        type = "inflow";
-        field_list = "velocity_y";
-        value = -1.0;
-      }
-      VZ_inflow {
-        face = "lower";
-        axis = "x";
-        type = "inflow";
-        field_list = "velocity_z";
-        value = 2.0;
+        value = 1.0 + 0.005 * t;
       }
-
-      downwind {
-         type = "outflow";
-         axis = "x";
-         face = "upper";
-      };
-
-      yedge {
-         type = "reflecting";
-      };
-
-      zedge {
-         type = "periodic";
-      };
-   }
-
-
-   Field {
-
-      ghost_depth = 3;
-
-      list = [
-        "density",
-        "velocity_x",
-        "velocity_y",
-        "velocity_z",
-        "total_energy",
-        "internal_energy",
-	"pressure"
-      ] ;
-
-      gamma = 1.4;
-
    }
 
    Method {
+      list = ["order_morton", "check", "ppm"];
 
-      list = ["ppm"];
-
-      ppm {
-         courant   = 0.8;
-         diffusion   = true;
-         flattening  = 3;
-         steepening  = true;
-         dual_energy = false;
-      }
-   }
-
-   Initial {
-
-       list = ["value"];
+      order_morton {
+          schedule {
+             list = [2, 4];
+             var = "cycle";
+          };
+      };
 
-       value {
-          density       = 1.0;
-          # if pressure = 1.0, then
-          #    specific internal energy = 1.0/((1.4 - 1.0) * 1.0) = 2.5
-          # specific kinetic energy = 0.5*v^2 = 0.5*(6) = 3.0 
-          total_energy  = 5.5;
-          velocity_x    = 1.0;
-          velocity_y    = -1.0;
-          velocity_z    = 2.0;
-          internal_energy = 0.0; 
-       }
-   }
+      check {
+          dir = [ "Check-%02d", "cycle" ];
+          num_files = 2;
+          ordering = "order_morton";
+          include_ghosts = false; # the program encounters an error on restart
+                                  # when this is true
+          schedule {
+             list = [2, 4];
+             var = "cycle";
+          };
+      };
 
-   Stopping {
-      cycle = 10;
    }