CODARcode
diff --git a/‎README.md‎
Lines changed: 119 additions & 23 deletions b/‎README.md‎
Lines changed: 119 additions & 23 deletions
diff --git a/‎codar/cheetah/model.py‎
Lines changed: 1 addition & 2 deletions b/‎codar/cheetah/model.py‎
Lines changed: 1 addition & 2 deletions
@@ -6,31 +6,43 @@ The CODAR Experiment Harness is designed to run Exascale science applications
 using different parameters and components to determine the best combination
 for deployment on different supercomputers.
 
-To use Cheetah, the user first writes a "campaign" specification file.
-Cheetah takes this specification, and generates a set of swift and bash
-scripts to execute the application many times with each of the parameter sets,
-and organize the results of each run in separate subdirectories. Once
-generated, the `run-all.sh` script in the output directory can be used
-to run the campaign.
+To use Cheetah, the user first writes a "campaign" specification file. It
+currently has four subcommands:
+
+1. `create-campaign` - takes the campaign specification and generate a set of
+  scripts to execute the application many times with each of the parameter
+  sets, and organize the results of each run in separate subdirectories. Once
+  generated, the `run-all.sh` script in the output directory can be used
+  to run the campaign.
+2. `status` - get status information about a campaign directory generated by
+  the `create-campaign` command
+3. `generate-report` - Generate a report of results from a completed campaign
+4. `help` - get information about available commands
+
+To get detailed help about options for each subcommand, execute the subcommand
+with the `-h` option, e.g. `cheetah.py create-campaign -h`.
 
 ## Requirements
 
-Cheetah v0.1 requires a modern Linux install with Python 3.4 or greater
-and CODAR Savanna v0.5. See the
-[savanna documentation](https://github.com/CODARcode/savanna)
-for installation instructions.
+Cheetah v0.5 requires a modern Linux install with Python 3.4 or greater. Some
+cheetah functionality is designed to inter-operate with the Savanna software
+stack, including ADIOS, Dataspaces, SOSFlow, and TAU, but simple campaigns can
+be run without further dependencies.
+See the [savanna documentation](https://github.com/CODARcode/savanna)
+for more information and install instructions.
 
 ## Tutorial for Running Heat Transfer example with Cheetah
 
 1. Install Savanna and build the Heat Transfer example (see [savanna
-   instructions](https://github.com/CODARcode/Example-Heat_Transfer/blob/master/README.adoc)). This tutorial will assume spack was used for the
+   instructions](https://github.com/CODARcode/savanna/blob/master/README.md)).
+   This tutorial will assume spack was used for the
    installation, and uses bash for environment setup examples.
 
-2. Download the Cheetah v0.1 release from github and unpack the release
-   [tarball](https://github.com/CODARcode/cheetah/archive/v0.1.tar.gz).
+2. Download the Cheetah v0.5 release from github and unpack the release
+   [tarball](https://github.com/CODARcode/cheetah/archive/v0.5.tar.gz).
 
-3. Set up environment for cheetah (this can be added to your ~/.bashrc
-   file for convenience, after spack environment is loaded):
+3. Set up the environment for cheetah (this can be added to your ~/.bashrc
+   file for convenience, after the spack environment is loaded):
 
 ```
 source <(spack module loads --dependencies adios)
@@ -49,26 +61,45 @@ mkdir -p ~/codar/campaigns
 ```
 path2savanna = `spack find -p savanna | grep savanna | awk '{ print $2 }'`
 cd /path/to/cheetah
-./cheetah.py -e examples/heat_transfer_small.py \
+./cheetah.py create-campaign -e examples/heat_transfer_small.py \
  -a $path2savanna/Heat_Transfer \
  -m local -o ~/codar/campaigns/heat
 ```
 
 6. Run the campaign:
 
 ```
-cd ~/codar/campaigns/heat
+cd ~/codar/campaigns/heat/$USER
 ./run-all.sh
 ```
 
-For results, see `GROUP_NAME/run-NNN`. To debug failures, look at
-`GROUP_NAME/codar.cheetah.submit-output.txt` first, then at the stdout
-and stderr files in each of the run directories.
+For run output, see `GROUP_NAME/run-NNN`. To debug submit failures, look at
+`GROUP_NAME/codar.cheetah.submit-output.txt`. To view progress of the run,
+start with the status command with no options, then use the options to drill
+down and get more details on specific groups and runs. For example:
+
+```
+# show progress of each group in the campaign
+cheetah.py status ~/codar/campaigns/heat
+
+# get a summary of runs within a group
+cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -s
+
+# show status of each run within a group
+cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -n
+
+# show stderr and stdout for a specific run within a group
+cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -r RUN\_NAME -o
+```
 
 ## Campaign Directory
 
-Within the output directory, cheetah creates a subdirectory for each group
-in the specification. Group directories contain the following files:
+Within the output directory, cheetah creates a subdirectory based on your
+username. Within the user directory, there is a subdirectory for each group
+in the specification. The `status` subcommand should be used as the first
+method for investigating the progress of a campaign, but it can also be
+useful to understand the structure and examine files directly.
+Group directories contain the following files:
 
 - submit.sh: script that submits the group to the scheduler (or runs in
   the background for local machine). The campaign `run-all.sh` script simply
@@ -130,7 +161,7 @@ directory, unless the `component_subdirs` option is set to True for the group.
 In that case, the working dir  for each code will be a subdirectory of the run
 directory with name equal to the code name.
 
-## SOSFlow Support
+## SOSFlow Support (beta)
 
 Cheetah can automatically configure sosflow daemons to run with an application.
 See [heat transfer example sosflow](examples/heat_transfer_sosflow.py). Note
@@ -187,6 +218,17 @@ supported parameter types. For a complete list, see the examples and the
   taken to generate all the instances to run. For simple campaigns that
   need to do a full cross product of parameter values, only one
   SweepGroup containing one Sweep is needed.
+- node\_layout - an option passed to the Sweep that determines the
+  way to allocate nodes and MPI processes to codes. The default is to allocate
+  an entire node to each code and use the maximum number of cores available.
+  Alternate configuration is specified in a dictionary, with
+  keys giving a machine name that the layout is designed for, and values
+  indicating the layout as a list of dictionaries. Each dictionary
+  represents a single node, the keys inside are code names, and the
+  values are the number of processes to use for each code. Node sharing
+  is not yet supported, so each node dictionary must contain only one code
+  entry, but the format is designed to support sharing. See also the
+  [node layout example](examples/heat_transfer_node_layout.py).
 - ParamX - all parameter types have at least three elements:
   - target - which code the parameter is for. The value must be one of
     the keys in the codes dictionary.
@@ -207,3 +249,57 @@ supported parameter types. For a complete list, see the examples and the
   on the convention used by the code. Note that this is distinct from
   the name, but a good choice for name is the option with the dashes
   removed.
+
+## Changelog
+
+### v0.5
+
+- New subcommand structure (the initial command structure is now under
+  the `create-campaign` subcommand).
+- New `status` subcommand
+- New  `generate-report` subcommand
+- Multi-user campaign support
+- Specify different node counts in Sweeps
+- Add experiments to an existing campaign directory
+- Option to override number of nodes
+- Feature to parse and consolidate campaign performance information
+- Have cheetah figure out min/max number of nodes
+- Mark file as ADIOS XML file for an application
+- Support for absolute paths for input files
+- Support for TAU trace directory
+- Support for input files with key-value parameters
+- Get Example-Heat_Transfer + Tau working with Cheetah
+- Support for running per-run setup script during campaign creation
+- Support for sub-directories for FOB components
+- Named group directories in campaign
+- Ordered invocation of FOB components
+- Insert delay between component invocations
+- Derived params based on params for codes
+- Option to kill run if any component fails
+- Support for timeouts
+- Support for named arguments
+- Set custom TAU environment variable for each code
+- Dataspaces integration
+- Improved ADIOS parameter support
+- Add `node\_layout` Sweep option for per-machine node configuration
+- Working machine support: local, cori, theta, and titan
+- Add umask campaign option
+- Per code/component input files
+- Improved workflow scheduling (more efficient use of available nodes
+  within a sweep group)
+- Add new parameter types `ParamKeyValue` (for ini, namelist, and other
+  similar name=value formatted config files) and `ParamConfig` (fully
+  generic string replacment, can be used with any format)
+- Add hook in report generation to run user script
+- Support symlinks for input files, useful to avoid copying large files
+- (beta) SOSFlow integration
+- (beta) Resume partially completed campaigns
+
+
+See the v0.5 milestone on github for a complete list including bug fixes:
+https://github.com/CODARcode/cheetah/milestone/1?closed=1
+
+### v0.1
+
+- Initial release
+- Working machine support: local, titan
@@ -166,14 +166,13 @@ def __init__(self, machine_name, app_dir):
             self.run_dir_setup_script = self._experiment_relative_path(
                                                 self.run_dir_setup_script)
 
+        self.machine_app_config_script = None
         if self.app_config_scripts is not None:
             assert isinstance(self.app_config_scripts, dict)
             script = self.app_config_scripts.get(machine_name)
             if script is not None:
                 self.machine_app_config_script = \
                     self._experiment_relative_path(script)
-        else:
-            self.machine_app_config_script = None
 
     def _get_machine(self, machine_name):
         machine = None