Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 90 additions & 16 deletions doc/source/tests/checkpoint.rst
Original file line number Diff line number Diff line change
@@ -1,37 +1,111 @@
.. _checkpoint-tests:

----------------
Checkpoint Tests
----------------

Tests for restart checkpoints. Runs 2D implosion using PPM method with slope mesh refinement.
At the time of writing, there are currently two mechanisms for checkpointing:

1. The new approach that writes checkpoints with the ``"check"`` method, that promises a lot more flexibility.
2. The legacy approach that uses Charm++'s machinery

The intention is to entirely transition to the new approach. But, at this time the new approach has limitations (see `Issue #316 <https://github.com/enzo-project/enzo-e/issues/316>`_). For that reason, both approaches are automatically tested (by the pytest machinery).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is not to transition to the new approach. Each has its advantages and disadvantages, so both should be kept. Charm++ CR can do lots of things Cello CR cannot.


The checkpoint-restart tests consist of 4 steps:

1. Perform some setup of the temporary directories where the simulation that creates the checkpoints will be run and a temporary directory where the restarted simulation will be run.
This setup may involve the creation of modified parameter files and the creation of symlinks (As is common with many test-cases, the symlinks allow the include-directives in parameter files to function properly - plus they facillitate the finding of the appropriate grackle data files).

2. Execute the checkpoint-run (this starts from a parameter file and creates checkpoints).

3. Execute the restart-run (this starts a simulation from the one of the checkpoints)

4. Compare the outputs of the two different runs.
Currently, we check that outputs are bitwise identical (**aside:** this may be a little limitting for functionality involving reductions like gravity methods)

The testing machinery has two major limitations:

1. it generally requires the usage of incomplete parameter files (the test machinery fills in some missing information as it goes)
2. it makes enforces different assumptions about what should be contained inside of a parameter file when testing the scalable-checkpoint approach and when testing the older Charm++-approach.

The first limitation can be somewhat mitigated through the use of our `tools/ckpt_restart_test.py` script.

* This script implements all of the logic from our checkpoint-restart testing machinery, but can be used outside of the pytest machinery.
It primarily exists to help debug cases where checkpoint-restart breaks.
It can also be used as a sanity check for whether the checkpoint-restart functionality works properly when using an arbitrary set of methods.

* For example, to run the test for the new-style machinery, you should invoke the following from the base directory of the repository:

.. code-block:: bash

python3 tools/ckpt_restart_test.py \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the following when I run this:

Test failed: NOT cleaning up
Traceback (most recent call last):
  File "/home/bordner/Cello/enzo-e.mabruzzo.new-ckpt-restart-tests/tools/ckpt_restart_test.py", line 165, in <module>
    main(parser.parse_args())
  File "/home/bordner/Cello/enzo-e.mabruzzo.new-ckpt-restart-tests/tools/ckpt_restart_test.py", line 71, in main
    run_ckpt_restart_test(nominal_input = args.input,
  File "/home/bordner/Cello/enzo-e.mabruzzo.new-ckpt-restart-tests/test/answer_tests/test_utils/ckpt_restart_testing.py", line 464, in run_ckpt_restart_test
    restart_h5obj_map = ckpt_block_file_map(
                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/bordner/Cello/enzo-e.mabruzzo.new-ckpt-restart-tests/test/answer_tests/test_utils/ckpt_restart_testing.py", line 22, in ckpt_block_file_map
    with open(filelist_path, 'r') as f_filelist:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'my_test/restart_run/Check-02/check.file_list'

--input input/Checkpoint/checkpoint_ppm.in \
--enzoe <path/to/enzo-e> \
--charm <path/to/charmrun> \
--stop_cycle 5 \
--symlink input \
--grackle-input-data-dir <path/to/grackle/input/data/dir> \
--legacy-output \
--test-dir my_test

The command is very similar to launch the test of checkpoints that use charm++ machinery:

.. code-block:: bash

python3 tools/ckpt_restart_test.py \
--input input/Checkpoint/legacy/checkpoint_ppm.in \
--enzoe <path/to/enzo-e> \
--charm <path/to/charmrun> \
--stop_cycle 5 \
--symlink input \
--grackle-input-data-dir <path/to/grackle/input/data/dir> \
--legacy-output \
--charm-restart \
--test-dir my_test

There are a few things to note about the flags passed to the command:

* For the particular simulation considered here, the ``--grackle-input-data-dir`` flag is technically unnecessary.
It's ONLY necessary when you consider a case that involves Grackle.

* ``--enzoe`` and ``--charm`` should be passed paths to ``enzo-e`` binary and ``charmrun`` binary, respectively

checkpoint_ppm-1
================
* the value passed to ``--test-dir`` is somewhat arbitrary.
It specifies the directory in which the simulations are executed.
The script will claim full-ownership over this directory.
If the directory already exists (and has any contents), the tests won't run unless the ``--clobber`` flag is passed (in which case, the script will clear ALL contents of that directory before doing anything else).

Tests checkpoint/restart for serial run methods
* In each case the full parameter file used to launch the initial run (that generates checkpoint-dumps) can be found in ``my_test/ckpt_run/parameters.in`` (the aggregated parameter file produced by that simulation should be located in ``my_test/ckpt_run/parameters.out``).

* For the new-style restarts, the parameter file used to launch the restart can be found in ``my_test/restart_run/parameters.in`` (or ``my_test/restart_run/parameters.out``). For charm-based restarts, there won't be a parameter file.

checkpoint_ppm-8
================

Tests checkpoint/restart for parallel run methods
List of the test cases in Framework
===================================

The following files are used for testing the legacy Checkpoint functionality

checkpoint_boundary
===================
* `input/Checkpoint/legacy/checkpoint_boundary.in`
* `input/Checkpoint/legacy/checkpoint_grackle.in`
* `input/Checkpoint/legacy/checkpoint_ppm.in`
* `input/Checkpoint/legacy/checkpoint_vlct.in`

Under construction
The following files are used for testing the new-style Checkpoint functionality

* `input/Checkpoint/checkpoint_boundary.in`
* `input/Checkpoint/checkpoint_grackle.in`
* `input/Checkpoint/checkpoint_ppm.in`

checkpoint_grackle
==================
.. note::

Under construction
Introduce a test of the new-style checkpointing for a simulation using the VL+CT Method.

.. note::

checkpoint_vlct
===============
This list should get updated as more tests get introduced. It may also be nice to add descriptions

Under construction

Tests outside of the framework
==============================

The files `input/Checkpoint/test_cosmo-check.in` and `input/Checkpoint/test_cosmo-restart.in` show a sample-cosmology simulation that uses the checkpoint-restart functionality.
6 changes: 5 additions & 1 deletion doc/source/tests/existing_tests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Currently, Enzo-e has the following test simulations in the input folder:
vlct
others

Existing Answer Tests
Existing pytest Tests
=====================

The answer test suite currently covers the following simulations:
Expand All @@ -72,3 +72,7 @@ The answer test suite currently covers the following simulations:

.. toctree::
grackle-pytest

Other pytest-tests cover the following simulations:

* :ref:`checkpoint-tests`
150 changes: 40 additions & 110 deletions input/Checkpoint/checkpoint_boundary.in
Original file line number Diff line number Diff line change
@@ -1,123 +1,53 @@
# The basic idea here is to check that the checkpoint capabilities of all
# boundaries work correctly. Maybe this should be broken into separate tests
# in the future.
# Problem: Checkpoint-Restart Boundary-Conditions
# Author: Matthew Abruzzo
#
# The testing tool automatically provides Stopping and Output sections

Domain {
lower = [0.0, 0.0, 0.0];
upper = [1.0, 1.0, 1.0];
}

Mesh {
root_rank = 3;
root_size = [16,16,16];
root_blocks = [2,2,2];
# This is an input file for testing the new-style checkpoint-restart machinery
#
# In more detail, this mostly acts like a template-file that is used by the
# testing tool. In practice, the testing tool will:
# - provide a Stopping section and an Output section.
# - use the contents of Field:list to determine which fields to compare before
# and after a restart (for new-style checkpoint-restart, the test can be
# configured to skip this)
# Details are provided on the website documentation about how you can see the
# full parameter files that are generated in this test.

include "input/Checkpoint/legacy/checkpoint_boundary.in"

Adapt {
max_initial_level = 0;
min_level = -1;
max_level = 0;
}

# we want to check if the time-dependence of a boundary works properly.
# it is not supported by the old-style checkpoint-restart infrastructure.
Boundary {
list = ["density_inflow", "total_energy_inflow", "VX_inflow",
"VY_inflow", "VZ_inflow", "downwind", "yedge", "zedge"];
density_inflow {
face = "lower";
axis = "x";
type = "inflow";
field_list = "density";
value = 1.0;
}
total_energy_inflow {
face = "lower";
axis = "x";
type = "inflow";
field_list = "total_energy";
value = 5.5;
}
VX_inflow {
face = "lower";
axis = "x";
type = "inflow";
field_list = "velocity_x";
value = 1.0;
}
VY_inflow {
face = "lower";
axis = "x";
type = "inflow";
field_list = "velocity_y";
value = -1.0;
}
VZ_inflow {
face = "lower";
axis = "x";
type = "inflow";
field_list = "velocity_z";
value = 2.0;
value = 1.0 + 0.005 * t;
}

downwind {
type = "outflow";
axis = "x";
face = "upper";
};

yedge {
type = "reflecting";
};

zedge {
type = "periodic";
};
}


Field {

ghost_depth = 3;

list = [
"density",
"velocity_x",
"velocity_y",
"velocity_z",
"total_energy",
"internal_energy",
"pressure"
] ;

gamma = 1.4;

}

Method {
list = ["order_morton", "check", "ppm"];

list = ["ppm"];

ppm {
courant = 0.8;
diffusion = true;
flattening = 3;
steepening = true;
dual_energy = false;
}
}

Initial {

list = ["value"];
order_morton {
schedule {
list = [2, 4];
var = "cycle";
};
};

value {
density = 1.0;
# if pressure = 1.0, then
# specific internal energy = 1.0/((1.4 - 1.0) * 1.0) = 2.5
# specific kinetic energy = 0.5*v^2 = 0.5*(6) = 3.0
total_energy = 5.5;
velocity_x = 1.0;
velocity_y = -1.0;
velocity_z = 2.0;
internal_energy = 0.0;
}
}
check {
dir = [ "Check-%02d", "cycle" ];
num_files = 2;
ordering = "order_morton";
include_ghosts = false; # the program encounters an error on restart
# when this is true
schedule {
list = [2, 4];
var = "cycle";
};
};

Stopping {
cycle = 10;
}
Loading