@@ -6,31 +6,43 @@ The CODAR Experiment Harness is designed to run Exascale science applications
66using different parameters and components to determine the best combination
77for deployment on different supercomputers.
88
9- To use Cheetah, the user first writes a "campaign" specification file.
10- Cheetah takes this specification, and generates a set of swift and bash
11- scripts to execute the application many times with each of the parameter sets,
12- and organize the results of each run in separate subdirectories. Once
13- generated, the ` run-all.sh ` script in the output directory can be used
14- to run the campaign.
9+ To use Cheetah, the user first writes a "campaign" specification file. It
10+ currently has four subcommands:
11+
12+ 1 . ` create-campaign ` - takes the campaign specification and generate a set of
13+ scripts to execute the application many times with each of the parameter
14+ sets, and organize the results of each run in separate subdirectories. Once
15+ generated, the ` run-all.sh ` script in the output directory can be used
16+ to run the campaign.
17+ 2 . ` status ` - get status information about a campaign directory generated by
18+ the ` create-campaign ` command
19+ 3 . ` generate-report ` - Generate a report of results from a completed campaign
20+ 4 . ` help ` - get information about available commands
21+
22+ To get detailed help about options for each subcommand, execute the subcommand
23+ with the ` -h ` option, e.g. ` cheetah.py create-campaign -h ` .
1524
1625## Requirements
1726
18- Cheetah v0.1 requires a modern Linux install with Python 3.4 or greater
19- and CODAR Savanna v0.5. See the
20- [ savanna documentation] ( https://github.com/CODARcode/savanna )
21- for installation instructions.
27+ Cheetah v0.5 requires a modern Linux install with Python 3.4 or greater. Some
28+ cheetah functionality is designed to inter-operate with the Savanna software
29+ stack, including ADIOS, Dataspaces, SOSFlow, and TAU, but simple campaigns can
30+ be run without further dependencies.
31+ See the [ savanna documentation] ( https://github.com/CODARcode/savanna )
32+ for more information and install instructions.
2233
2334## Tutorial for Running Heat Transfer example with Cheetah
2435
25361 . Install Savanna and build the Heat Transfer example (see [ savanna
26- instructions] ( https://github.com/CODARcode/Example-Heat_Transfer/blob/master/README.adoc ) ). This tutorial will assume spack was used for the
37+ instructions] ( https://github.com/CODARcode/savanna/blob/master/README.md ) ).
38+ This tutorial will assume spack was used for the
2739 installation, and uses bash for environment setup examples.
2840
29- 2 . Download the Cheetah v0.1 release from github and unpack the release
30- [ tarball] ( https://github.com/CODARcode/cheetah/archive/v0.1 .tar.gz ) .
41+ 2 . Download the Cheetah v0.5 release from github and unpack the release
42+ [ tarball] ( https://github.com/CODARcode/cheetah/archive/v0.5 .tar.gz ) .
3143
32- 3 . Set up environment for cheetah (this can be added to your ~ /.bashrc
33- file for convenience, after spack environment is loaded):
44+ 3 . Set up the environment for cheetah (this can be added to your ~ /.bashrc
45+ file for convenience, after the spack environment is loaded):
3446
3547```
3648source <(spack module loads --dependencies adios)
@@ -49,26 +61,45 @@ mkdir -p ~/codar/campaigns
4961```
5062path2savanna = `spack find -p savanna | grep savanna | awk '{ print $2 }'`
5163cd /path/to/cheetah
52- ./cheetah.py -e examples/heat_transfer_small.py \
64+ ./cheetah.py create-campaign -e examples/heat_transfer_small.py \
5365 -a $path2savanna/Heat_Transfer \
5466 -m local -o ~/codar/campaigns/heat
5567```
5668
57696 . Run the campaign:
5870
5971```
60- cd ~/codar/campaigns/heat
72+ cd ~/codar/campaigns/heat/$USER
6173./run-all.sh
6274```
6375
64- For results, see ` GROUP_NAME/run-NNN ` . To debug failures, look at
65- ` GROUP_NAME/codar.cheetah.submit-output.txt ` first, then at the stdout
66- and stderr files in each of the run directories.
76+ For run output, see ` GROUP_NAME/run-NNN ` . To debug submit failures, look at
77+ ` GROUP_NAME/codar.cheetah.submit-output.txt ` . To view progress of the run,
78+ start with the status command with no options, then use the options to drill
79+ down and get more details on specific groups and runs. For example:
80+
81+ ```
82+ # show progress of each group in the campaign
83+ cheetah.py status ~/codar/campaigns/heat
84+
85+ # get a summary of runs within a group
86+ cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -s
87+
88+ # show status of each run within a group
89+ cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -n
90+
91+ # show stderr and stdout for a specific run within a group
92+ cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -r RUN\_NAME -o
93+ ```
6794
6895## Campaign Directory
6996
70- Within the output directory, cheetah creates a subdirectory for each group
71- in the specification. Group directories contain the following files:
97+ Within the output directory, cheetah creates a subdirectory based on your
98+ username. Within the user directory, there is a subdirectory for each group
99+ in the specification. The ` status ` subcommand should be used as the first
100+ method for investigating the progress of a campaign, but it can also be
101+ useful to understand the structure and examine files directly.
102+ Group directories contain the following files:
72103
73104- submit.sh: script that submits the group to the scheduler (or runs in
74105 the background for local machine). The campaign ` run-all.sh ` script simply
@@ -130,7 +161,7 @@ directory, unless the `component_subdirs` option is set to True for the group.
130161In that case, the working dir for each code will be a subdirectory of the run
131162directory with name equal to the code name.
132163
133- ## SOSFlow Support
164+ ## SOSFlow Support (beta)
134165
135166Cheetah can automatically configure sosflow daemons to run with an application.
136167See [ heat transfer example sosflow] ( examples/heat_transfer_sosflow.py ) . Note
@@ -187,6 +218,17 @@ supported parameter types. For a complete list, see the examples and the
187218 taken to generate all the instances to run. For simple campaigns that
188219 need to do a full cross product of parameter values, only one
189220 SweepGroup containing one Sweep is needed.
221+ - node\_ layout - an option passed to the Sweep that determines the
222+ way to allocate nodes and MPI processes to codes. The default is to allocate
223+ an entire node to each code and use the maximum number of cores available.
224+ Alternate configuration is specified in a dictionary, with
225+ keys giving a machine name that the layout is designed for, and values
226+ indicating the layout as a list of dictionaries. Each dictionary
227+ represents a single node, the keys inside are code names, and the
228+ values are the number of processes to use for each code. Node sharing
229+ is not yet supported, so each node dictionary must contain only one code
230+ entry, but the format is designed to support sharing. See also the
231+ [ node layout example] ( examples/heat_transfer_node_layout.py ) .
190232- ParamX - all parameter types have at least three elements:
191233 - target - which code the parameter is for. The value must be one of
192234 the keys in the codes dictionary.
@@ -207,3 +249,57 @@ supported parameter types. For a complete list, see the examples and the
207249 on the convention used by the code. Note that this is distinct from
208250 the name, but a good choice for name is the option with the dashes
209251 removed.
252+
253+ ## Changelog
254+
255+ ### v0.5
256+
257+ - New subcommand structure (the initial command structure is now under
258+ the ` create-campaign ` subcommand).
259+ - New ` status ` subcommand
260+ - New ` generate-report ` subcommand
261+ - Multi-user campaign support
262+ - Specify different node counts in Sweeps
263+ - Add experiments to an existing campaign directory
264+ - Option to override number of nodes
265+ - Feature to parse and consolidate campaign performance information
266+ - Have cheetah figure out min/max number of nodes
267+ - Mark file as ADIOS XML file for an application
268+ - Support for absolute paths for input files
269+ - Support for TAU trace directory
270+ - Support for input files with key-value parameters
271+ - Get Example-Heat_Transfer + Tau working with Cheetah
272+ - Support for running per-run setup script during campaign creation
273+ - Support for sub-directories for FOB components
274+ - Named group directories in campaign
275+ - Ordered invocation of FOB components
276+ - Insert delay between component invocations
277+ - Derived params based on params for codes
278+ - Option to kill run if any component fails
279+ - Support for timeouts
280+ - Support for named arguments
281+ - Set custom TAU environment variable for each code
282+ - Dataspaces integration
283+ - Improved ADIOS parameter support
284+ - Add ` node\_layout ` Sweep option for per-machine node configuration
285+ - Working machine support: local, cori, theta, and titan
286+ - Add umask campaign option
287+ - Per code/component input files
288+ - Improved workflow scheduling (more efficient use of available nodes
289+ within a sweep group)
290+ - Add new parameter types ` ParamKeyValue ` (for ini, namelist, and other
291+ similar name=value formatted config files) and ` ParamConfig ` (fully
292+ generic string replacment, can be used with any format)
293+ - Add hook in report generation to run user script
294+ - Support symlinks for input files, useful to avoid copying large files
295+ - (beta) SOSFlow integration
296+ - (beta) Resume partially completed campaigns
297+
298+
299+ See the v0.5 milestone on github for a complete list including bug fixes:
300+ https://github.com/CODARcode/cheetah/milestone/1?closed=1
301+
302+ ### v0.1
303+
304+ - Initial release
305+ - Working machine support: local, titan
0 commit comments