-
Notifications
You must be signed in to change notification settings - Fork 36
New automated Checkpoint-Restart Tests #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mabruzzo
wants to merge
11
commits into
enzo-project:main
Choose a base branch
from
mabruzzo:new-ckpt-restart-tests
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 5 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
2dfe9c5
Laying the foundation for automated tests of the new-style checkpoint…
mabruzzo 597b115
introduced new ckpt-restart tests to test suite.
mabruzzo f922517
introduce support for testing the legacy charm-based restarts in the …
mabruzzo 346a7b7
Documented the checkpoint restart tests.
mabruzzo d3d6a2e
comment out the automated answer-tests for the new-style Checkpoint-R…
mabruzzo e09d392
Apply suggestions from code review
mabruzzo 8cab0f5
Merge branch 'main' into new-ckpt-restart-tests
mabruzzo 470d6b4
Merge branch 'main' into new-ckpt-restart-tests
mabruzzo 927def2
renaming ppm-implosion2D-NoTestingSection.incl -> ppm-implosion2D-NoT…
mabruzzo 641246c
Merge branch 'main' into new-ckpt-restart-tests
mabruzzo a090ec5
Fix minor serialization bug I previously introduced (basically, I wou…
mabruzzo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,37 +1,111 @@ | ||
| .. _checkpoint-tests: | ||
|
|
||
| ---------------- | ||
| Checkpoint Tests | ||
| ---------------- | ||
|
|
||
| Tests for restart checkpoints. Runs 2D implosion using PPM method with slope mesh refinement. | ||
| At the time of writing, there are currently two mechanisms for checkpointing: | ||
|
|
||
| 1. The new approach that writes checkpoints with the ``"check"`` method, that promises a lot more flexibility. | ||
| 2. The legacy approach that uses Charm++'s machinery | ||
|
|
||
| The intention is to entirely transition to the new approach. But, at this time the new approach has limitations (see `Issue #316 <https://github.com/enzo-project/enzo-e/issues/316>`_). For that reason, both approaches are automatically tested (by the pytest machinery). | ||
|
|
||
| The checkpoint-restart tests consist of 4 steps: | ||
|
|
||
| 1. Perform some setup of the temporary directories where the simulation that creates the checkpoints will be run and a temporary directory where the restarted simulation will be run. | ||
| This setup may involve the creation of modified parameter files and the creation of symlinks (As is common with many test-cases, the symlinks allow the include-directives in parameter files to function properly - plus they facillitate the finding of the appropriate grackle data files). | ||
|
|
||
| 2. Execute the checkpoint-run (this starts from a parameter file and creates checkpoints). | ||
|
|
||
| 3. Execute the restart-run (this starts a simulation from the one of the checkpoints) | ||
|
|
||
| 4. Compare the outputs of the two different runs. | ||
| Currently, we check that outputs are bitwise identical (**aside:** this may be a little limitting for functionality involving reductions like gravity methods) | ||
|
|
||
| The testing machinery has two major limitations: | ||
|
|
||
| 1. it generally requires the usage of incomplete parameter files (the test machinery fills in some missing information as it goes) | ||
| 2. it makes enforces different assumptions about what should be contained inside of a parameter file when testing the scalable-checkpoint approach and when testing the older Charm++-approach. | ||
mabruzzo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The first limitation can be somewhat mitigated through the use of our `tools/ckpt_restart_test.py` script. | ||
|
|
||
| * This script implements all of the logic from our checkpoint-restart testing machinery, but can be used outside of the pytest machinery. | ||
| It primarily exists to help debug cases where checkpoint-restart breaks. | ||
| It can also be used as a sanity check for whether the checkpoint-restart functionality works properly when using an arbitrary set of methods. | ||
|
|
||
| * For example, to run the test for the new-style machinery, you should invoke the following from the base directory of the repository: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| python3 tools/ckpt_restart_test.py \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I get the following when I run this: |
||
| --input input/Checkpoint/checkpoint_ppm.in \ | ||
| --enzoe <path/to/enzo-e> \ | ||
| --charm <path/to/charmrun> \ | ||
| --stop_cycle 5 \ | ||
| --symlink input \ | ||
| --grackle-input-data-dir <path/to/grackle/input/data/dir> \ | ||
| --legacy-output \ | ||
| --test-dir my_test | ||
|
|
||
| The command is very similar to launch the test of checkpoints that use charm++ machinery: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| python3 tools/ckpt_restart_test.py \ | ||
| --input input/Checkpoint/legacy/checkpoint_ppm.in \ | ||
| --enzoe <path/to/enzo-e> \ | ||
| --charm <path/to/charmrun> \ | ||
| --stop_cycle 5 \ | ||
| --symlink input \ | ||
| --grackle-input-data-dir <path/to/grackle/input/data/dir> \ | ||
| --legacy-output \ | ||
| --charm-restart \ | ||
| --test-dir my_test | ||
|
|
||
| There are a few things to note about the flags passed to the command: | ||
|
|
||
| * For the particular simulation considered here, the ``--grackle-input-data-dir`` flag is technically unnecessary. | ||
| It's ONLY necessary when you consider a case that involves Grackle. | ||
|
|
||
| * ``--enzoe`` and ``--charm`` should be passed paths to ``enzo-e`` binary and ``charmrun`` binary, respectively | ||
|
|
||
| checkpoint_ppm-1 | ||
| ================ | ||
| * the value passed to ``--test-dir`` is somewhat arbitrary. | ||
| It specifies the directory in which the simulations are executed. | ||
| The script will claim full-ownership over this directory. | ||
| If the directory already exists (and has any contents), the tests won't run unless the ``--clobber`` flag is passed (in which case, the script will clear ALL contents of that directory before doing anything else). | ||
|
|
||
| Tests checkpoint/restart for serial run methods | ||
| * In each case the full parameter file used to launch the initial run (that generates checkpoint-dumps) can be found in ``my_test/ckpt_run/parameters.in`` (the aggregated parameter file produced by that simulation should be located in ``my_test/ckpt_run/parameters.out``). | ||
|
|
||
| * For the new-style restarts, the parameter file used to launch the restart can be found in ``my_test/restart_run/parameters.in`` (or ``my_test/restart_run/parameters.out``). For charm-based restarts, there won't be a parameter file. | ||
|
|
||
| checkpoint_ppm-8 | ||
| ================ | ||
|
|
||
| Tests checkpoint/restart for parallel run methods | ||
| List of the test cases in Framework | ||
| =================================== | ||
|
|
||
| The following files are used for testing the legacy Checkpoint functionality | ||
|
|
||
| checkpoint_boundary | ||
| =================== | ||
| * `input/Checkpoint/legacy/checkpoint_boundary.in` | ||
| * `input/Checkpoint/legacy/checkpoint_grackle.in` | ||
| * `input/Checkpoint/legacy/checkpoint_ppm.in` | ||
| * `input/Checkpoint/legacy/checkpoint_vlct.in` | ||
|
|
||
| Under construction | ||
| The following files are used for testing the new-style Checkpoint functionality | ||
|
|
||
| * `input/Checkpoint/checkpoint_boundary.in` | ||
| * `input/Checkpoint/checkpoint_grackle.in` | ||
| * `input/Checkpoint/checkpoint_ppm.in` | ||
|
|
||
| checkpoint_grackle | ||
| ================== | ||
| .. note:: | ||
|
|
||
| Under construction | ||
| Introduce a test of the new-style checkpointing for a simulation using the VL+CT Method. | ||
|
|
||
| .. note:: | ||
|
|
||
| checkpoint_vlct | ||
| =============== | ||
| This list should get updated as more tests get introduced. It may also be nice to add descriptions | ||
|
|
||
| Under construction | ||
|
|
||
| Tests outside of the framework | ||
| ============================== | ||
|
|
||
| The files `input/Checkpoint/test_cosmo-check.in` and `input/Checkpoint/test_cosmo-restart.in` show a sample-cosmology simulation that uses the checkpoint-restart functionality. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,123 +1,53 @@ | ||
| # The basic idea here is to check that the checkpoint capabilities of all | ||
| # boundaries work correctly. Maybe this should be broken into separate tests | ||
| # in the future. | ||
| # Problem: Checkpoint-Restart Boundary-Conditions | ||
| # Author: Matthew Abruzzo | ||
| # | ||
| # The testing tool automatically provides Stopping and Output sections | ||
|
|
||
| Domain { | ||
| lower = [0.0, 0.0, 0.0]; | ||
| upper = [1.0, 1.0, 1.0]; | ||
| } | ||
|
|
||
| Mesh { | ||
| root_rank = 3; | ||
| root_size = [16,16,16]; | ||
| root_blocks = [2,2,2]; | ||
| # This is an input file for testing the new-style checkpoint-restart machinery | ||
| # | ||
| # In more detail, this mostly acts like a template-file that is used by the | ||
| # testing tool. In practice, the testing tool will: | ||
| # - provide a Stopping section and an Output section. | ||
| # - use the contents of Field:list to determine which fields to compare before | ||
| # and after a restart (for new-style checkpoint-restart, the test can be | ||
| # configured to skip this) | ||
| # Details are provided on the website documentation about how you can see the | ||
| # full parameter files that are generated in this test. | ||
|
|
||
| include "input/Checkpoint/legacy/checkpoint_boundary.in" | ||
|
|
||
| Adapt { | ||
| max_initial_level = 0; | ||
| min_level = -1; | ||
| max_level = 0; | ||
| } | ||
|
|
||
| # we want to check if the time-dependence of a boundary works properly. | ||
| # it is not supported by the old-style checkpoint-restart infrastructure. | ||
| Boundary { | ||
| list = ["density_inflow", "total_energy_inflow", "VX_inflow", | ||
| "VY_inflow", "VZ_inflow", "downwind", "yedge", "zedge"]; | ||
| density_inflow { | ||
| face = "lower"; | ||
| axis = "x"; | ||
| type = "inflow"; | ||
| field_list = "density"; | ||
| value = 1.0; | ||
| } | ||
| total_energy_inflow { | ||
| face = "lower"; | ||
| axis = "x"; | ||
| type = "inflow"; | ||
| field_list = "total_energy"; | ||
| value = 5.5; | ||
| } | ||
| VX_inflow { | ||
| face = "lower"; | ||
| axis = "x"; | ||
| type = "inflow"; | ||
| field_list = "velocity_x"; | ||
| value = 1.0; | ||
| } | ||
| VY_inflow { | ||
| face = "lower"; | ||
| axis = "x"; | ||
| type = "inflow"; | ||
| field_list = "velocity_y"; | ||
| value = -1.0; | ||
| } | ||
| VZ_inflow { | ||
| face = "lower"; | ||
| axis = "x"; | ||
| type = "inflow"; | ||
| field_list = "velocity_z"; | ||
| value = 2.0; | ||
| value = 1.0 + 0.005 * t; | ||
| } | ||
|
|
||
| downwind { | ||
| type = "outflow"; | ||
| axis = "x"; | ||
| face = "upper"; | ||
| }; | ||
|
|
||
| yedge { | ||
| type = "reflecting"; | ||
| }; | ||
|
|
||
| zedge { | ||
| type = "periodic"; | ||
| }; | ||
| } | ||
|
|
||
|
|
||
| Field { | ||
|
|
||
| ghost_depth = 3; | ||
|
|
||
| list = [ | ||
| "density", | ||
| "velocity_x", | ||
| "velocity_y", | ||
| "velocity_z", | ||
| "total_energy", | ||
| "internal_energy", | ||
| "pressure" | ||
| ] ; | ||
|
|
||
| gamma = 1.4; | ||
|
|
||
| } | ||
|
|
||
| Method { | ||
| list = ["order_morton", "check", "ppm"]; | ||
|
|
||
| list = ["ppm"]; | ||
|
|
||
| ppm { | ||
| courant = 0.8; | ||
| diffusion = true; | ||
| flattening = 3; | ||
| steepening = true; | ||
| dual_energy = false; | ||
| } | ||
| } | ||
|
|
||
| Initial { | ||
|
|
||
| list = ["value"]; | ||
| order_morton { | ||
| schedule { | ||
| list = [2, 4]; | ||
| var = "cycle"; | ||
| }; | ||
| }; | ||
|
|
||
| value { | ||
| density = 1.0; | ||
| # if pressure = 1.0, then | ||
| # specific internal energy = 1.0/((1.4 - 1.0) * 1.0) = 2.5 | ||
| # specific kinetic energy = 0.5*v^2 = 0.5*(6) = 3.0 | ||
| total_energy = 5.5; | ||
| velocity_x = 1.0; | ||
| velocity_y = -1.0; | ||
| velocity_z = 2.0; | ||
| internal_energy = 0.0; | ||
| } | ||
| } | ||
| check { | ||
| dir = [ "Check-%02d", "cycle" ]; | ||
| num_files = 2; | ||
| ordering = "order_morton"; | ||
| include_ghosts = false; # the program encounters an error on restart | ||
| # when this is true | ||
| schedule { | ||
| list = [2, 4]; | ||
| var = "cycle"; | ||
| }; | ||
| }; | ||
|
|
||
| Stopping { | ||
| cycle = 10; | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is not to transition to the new approach. Each has its advantages and disadvantages, so both should be kept. Charm++ CR can do lots of things Cello CR cannot.