Skip to content

Commit 006c42e

Browse files
Merge pull request #133 from CODARcode/doc-05
Doc 05
2 parents 3c0bb96 + 44f7daf commit 006c42e

File tree

5 files changed

+336
-26
lines changed

5 files changed

+336
-26
lines changed

README.md

Lines changed: 119 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,31 +6,43 @@ The CODAR Experiment Harness is designed to run Exascale science applications
66
using different parameters and components to determine the best combination
77
for deployment on different supercomputers.
88

9-
To use Cheetah, the user first writes a "campaign" specification file.
10-
Cheetah takes this specification, and generates a set of swift and bash
11-
scripts to execute the application many times with each of the parameter sets,
12-
and organize the results of each run in separate subdirectories. Once
13-
generated, the `run-all.sh` script in the output directory can be used
14-
to run the campaign.
9+
To use Cheetah, the user first writes a "campaign" specification file. It
10+
currently has four subcommands:
11+
12+
1. `create-campaign` - takes the campaign specification and generate a set of
13+
scripts to execute the application many times with each of the parameter
14+
sets, and organize the results of each run in separate subdirectories. Once
15+
generated, the `run-all.sh` script in the output directory can be used
16+
to run the campaign.
17+
2. `status` - get status information about a campaign directory generated by
18+
the `create-campaign` command
19+
3. `generate-report` - Generate a report of results from a completed campaign
20+
4. `help` - get information about available commands
21+
22+
To get detailed help about options for each subcommand, execute the subcommand
23+
with the `-h` option, e.g. `cheetah.py create-campaign -h`.
1524

1625
## Requirements
1726

18-
Cheetah v0.1 requires a modern Linux install with Python 3.4 or greater
19-
and CODAR Savanna v0.5. See the
20-
[savanna documentation](https://github.com/CODARcode/savanna)
21-
for installation instructions.
27+
Cheetah v0.5 requires a modern Linux install with Python 3.4 or greater. Some
28+
cheetah functionality is designed to inter-operate with the Savanna software
29+
stack, including ADIOS, Dataspaces, SOSFlow, and TAU, but simple campaigns can
30+
be run without further dependencies.
31+
See the [savanna documentation](https://github.com/CODARcode/savanna)
32+
for more information and install instructions.
2233

2334
## Tutorial for Running Heat Transfer example with Cheetah
2435

2536
1. Install Savanna and build the Heat Transfer example (see [savanna
26-
instructions](https://github.com/CODARcode/Example-Heat_Transfer/blob/master/README.adoc)). This tutorial will assume spack was used for the
37+
instructions](https://github.com/CODARcode/savanna/blob/master/README.md)).
38+
This tutorial will assume spack was used for the
2739
installation, and uses bash for environment setup examples.
2840

29-
2. Download the Cheetah v0.1 release from github and unpack the release
30-
[tarball](https://github.com/CODARcode/cheetah/archive/v0.1.tar.gz).
41+
2. Download the Cheetah v0.5 release from github and unpack the release
42+
[tarball](https://github.com/CODARcode/cheetah/archive/v0.5.tar.gz).
3143

32-
3. Set up environment for cheetah (this can be added to your ~/.bashrc
33-
file for convenience, after spack environment is loaded):
44+
3. Set up the environment for cheetah (this can be added to your ~/.bashrc
45+
file for convenience, after the spack environment is loaded):
3446

3547
```
3648
source <(spack module loads --dependencies adios)
@@ -49,26 +61,45 @@ mkdir -p ~/codar/campaigns
4961
```
5062
path2savanna = `spack find -p savanna | grep savanna | awk '{ print $2 }'`
5163
cd /path/to/cheetah
52-
./cheetah.py -e examples/heat_transfer_small.py \
64+
./cheetah.py create-campaign -e examples/heat_transfer_small.py \
5365
-a $path2savanna/Heat_Transfer \
5466
-m local -o ~/codar/campaigns/heat
5567
```
5668

5769
6. Run the campaign:
5870

5971
```
60-
cd ~/codar/campaigns/heat
72+
cd ~/codar/campaigns/heat/$USER
6173
./run-all.sh
6274
```
6375

64-
For results, see `GROUP_NAME/run-NNN`. To debug failures, look at
65-
`GROUP_NAME/codar.cheetah.submit-output.txt` first, then at the stdout
66-
and stderr files in each of the run directories.
76+
For run output, see `GROUP_NAME/run-NNN`. To debug submit failures, look at
77+
`GROUP_NAME/codar.cheetah.submit-output.txt`. To view progress of the run,
78+
start with the status command with no options, then use the options to drill
79+
down and get more details on specific groups and runs. For example:
80+
81+
```
82+
# show progress of each group in the campaign
83+
cheetah.py status ~/codar/campaigns/heat
84+
85+
# get a summary of runs within a group
86+
cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -s
87+
88+
# show status of each run within a group
89+
cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -n
90+
91+
# show stderr and stdout for a specific run within a group
92+
cheetah.py status ~/codar/campaigns/heat -g GROUP\_NAME -r RUN\_NAME -o
93+
```
6794

6895
## Campaign Directory
6996

70-
Within the output directory, cheetah creates a subdirectory for each group
71-
in the specification. Group directories contain the following files:
97+
Within the output directory, cheetah creates a subdirectory based on your
98+
username. Within the user directory, there is a subdirectory for each group
99+
in the specification. The `status` subcommand should be used as the first
100+
method for investigating the progress of a campaign, but it can also be
101+
useful to understand the structure and examine files directly.
102+
Group directories contain the following files:
72103

73104
- submit.sh: script that submits the group to the scheduler (or runs in
74105
the background for local machine). The campaign `run-all.sh` script simply
@@ -130,7 +161,7 @@ directory, unless the `component_subdirs` option is set to True for the group.
130161
In that case, the working dir for each code will be a subdirectory of the run
131162
directory with name equal to the code name.
132163

133-
## SOSFlow Support
164+
## SOSFlow Support (beta)
134165

135166
Cheetah can automatically configure sosflow daemons to run with an application.
136167
See [heat transfer example sosflow](examples/heat_transfer_sosflow.py). Note
@@ -187,6 +218,17 @@ supported parameter types. For a complete list, see the examples and the
187218
taken to generate all the instances to run. For simple campaigns that
188219
need to do a full cross product of parameter values, only one
189220
SweepGroup containing one Sweep is needed.
221+
- node\_layout - an option passed to the Sweep that determines the
222+
way to allocate nodes and MPI processes to codes. The default is to allocate
223+
an entire node to each code and use the maximum number of cores available.
224+
Alternate configuration is specified in a dictionary, with
225+
keys giving a machine name that the layout is designed for, and values
226+
indicating the layout as a list of dictionaries. Each dictionary
227+
represents a single node, the keys inside are code names, and the
228+
values are the number of processes to use for each code. Node sharing
229+
is not yet supported, so each node dictionary must contain only one code
230+
entry, but the format is designed to support sharing. See also the
231+
[node layout example](examples/heat_transfer_node_layout.py).
190232
- ParamX - all parameter types have at least three elements:
191233
- target - which code the parameter is for. The value must be one of
192234
the keys in the codes dictionary.
@@ -207,3 +249,57 @@ supported parameter types. For a complete list, see the examples and the
207249
on the convention used by the code. Note that this is distinct from
208250
the name, but a good choice for name is the option with the dashes
209251
removed.
252+
253+
## Changelog
254+
255+
### v0.5
256+
257+
- New subcommand structure (the initial command structure is now under
258+
the `create-campaign` subcommand).
259+
- New `status` subcommand
260+
- New `generate-report` subcommand
261+
- Multi-user campaign support
262+
- Specify different node counts in Sweeps
263+
- Add experiments to an existing campaign directory
264+
- Option to override number of nodes
265+
- Feature to parse and consolidate campaign performance information
266+
- Have cheetah figure out min/max number of nodes
267+
- Mark file as ADIOS XML file for an application
268+
- Support for absolute paths for input files
269+
- Support for TAU trace directory
270+
- Support for input files with key-value parameters
271+
- Get Example-Heat_Transfer + Tau working with Cheetah
272+
- Support for running per-run setup script during campaign creation
273+
- Support for sub-directories for FOB components
274+
- Named group directories in campaign
275+
- Ordered invocation of FOB components
276+
- Insert delay between component invocations
277+
- Derived params based on params for codes
278+
- Option to kill run if any component fails
279+
- Support for timeouts
280+
- Support for named arguments
281+
- Set custom TAU environment variable for each code
282+
- Dataspaces integration
283+
- Improved ADIOS parameter support
284+
- Add `node\_layout` Sweep option for per-machine node configuration
285+
- Working machine support: local, cori, theta, and titan
286+
- Add umask campaign option
287+
- Per code/component input files
288+
- Improved workflow scheduling (more efficient use of available nodes
289+
within a sweep group)
290+
- Add new parameter types `ParamKeyValue` (for ini, namelist, and other
291+
similar name=value formatted config files) and `ParamConfig` (fully
292+
generic string replacment, can be used with any format)
293+
- Add hook in report generation to run user script
294+
- Support symlinks for input files, useful to avoid copying large files
295+
- (beta) SOSFlow integration
296+
- (beta) Resume partially completed campaigns
297+
298+
299+
See the v0.5 milestone on github for a complete list including bug fixes:
300+
https://github.com/CODARcode/cheetah/milestone/1?closed=1
301+
302+
### v0.1
303+
304+
- Initial release
305+
- Working machine support: local, titan

codar/cheetah/model.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -166,14 +166,13 @@ def __init__(self, machine_name, app_dir):
166166
self.run_dir_setup_script = self._experiment_relative_path(
167167
self.run_dir_setup_script)
168168

169+
self.machine_app_config_script = None
169170
if self.app_config_scripts is not None:
170171
assert isinstance(self.app_config_scripts, dict)
171172
script = self.app_config_scripts.get(machine_name)
172173
if script is not None:
173174
self.machine_app_config_script = \
174175
self._experiment_relative_path(script)
175-
else:
176-
self.machine_app_config_script = None
177176

178177
def _get_machine(self, machine_name):
179178
machine = None

0 commit comments

Comments
 (0)