Skip to content

Commit 89096ff

Browse files
committed
resolving conflcits
2 parents e493b44 + 76d015c commit 89096ff

39 files changed

+1026
-709
lines changed

docs/home.adoc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,22 @@ Other Theta ESP notes are here: https://collab.cels.anl.gov/display/ESP
4343
This uses the system-installed Python with ML libs at: +
4444
+/usr/common/software/python/2.7-anaconda/envs/deeplearning+
4545

46+
[[titan]]
47+
* https://www.olcf.ornl.gov/titan[Titan]
48+
+
49+
This is a CANDLE-only installation. It uses the OLCF-provided Python +deeplearning+ module (Python 3.6 plus TensorFlow, Theano, and Keras) and R 3.3.2 .
50+
+
51+
Add to +PATH+: +/lustre/atlas2/csc249/proj-shared/sfw/swift-t/stc/bin+
52+
+
53+
Run with:
54+
+
55+
----
56+
$ export TITAN=true
57+
$ export PROJECT=... QUEUE=...
58+
$ export LD_LIBRARY_PATH=/sw/xk6/deeplearning/1.0/sles11.3_gnu4.9.3/lib:/sw/xk6/deeplearning/1.0/sles11.3_gnu4.9.3/cuda/lib64:/opt/gcc/4.9.3/snos/lib64:/sw/xk6/r/3.3.2/sles11.3_gnu4.9.3x/lib64/R/lib
59+
$ swift-t -m cray -e LD_LIBRARY_PATH=$LD_LIBRARY_PATH workflow.swift
60+
----
61+
4662
* http://swift-lang.github.io/swift-t/sites.html#cooley_candle[Cooley]
4763
+
4864
This uses the system-installed Python with ML libs at: +

docs/home.html

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -803,6 +803,21 @@ <h2 id="_swift_installations">Swift installations</h2>
803803
</li>
804804
<li>
805805
<p>
806+
<a href="https://www.olcf.ornl.gov/titan">Titan</a>
807+
</p>
808+
<div class="paragraph" id="titan"><p>This is a CANDLE-only installation. It uses the OLCF-provided Python <code>deeplearning</code> module (Python 3.6 plus TensorFlow, Theano, and Keras) and R 3.3.2 .</p></div>
809+
<div class="paragraph"><p>Add to <code>PATH</code>: <code>/lustre/atlas2/csc249/proj-shared/sfw/swift-t/stc/bin</code></p></div>
810+
<div class="paragraph"><p>Run with:</p></div>
811+
<div class="listingblock">
812+
<div class="content">
813+
<pre><code>$ export TITAN=true
814+
$ export PROJECT=... QUEUE=...
815+
$ export LD_LIBRARY_PATH=/sw/xk6/deeplearning/1.0/sles11.3_gnu4.9.3/lib:/sw/xk6/deeplearning/1.0/sles11.3_gnu4.9.3/cuda/lib64:/opt/gcc/4.9.3/snos/lib64:/sw/xk6/r/3.3.2/sles11.3_gnu4.9.3x/lib64/R/lib
816+
$ swift-t -m cray -e LD_LIBRARY_PATH=$LD_LIBRARY_PATH workflow.swift</code></pre>
817+
</div></div>
818+
</li>
819+
<li>
820+
<p>
806821
<a href="http://swift-lang.github.io/swift-t/sites.html#cooley_candle">Cooley</a>
807822
</p>
808823
<div class="paragraph"><p>This uses the system-installed Python with ML libs at:<br />
@@ -834,7 +849,8 @@ <h2 id="_swift_installations">Swift installations</h2>
834849
<div id="footnotes"><hr /></div>
835850
<div id="footer">
836851
<div id="footer-text">
837-
Last updated 2017-06-07 11:35:47 CDT
852+
Last updated
853+
2017-06-21 13:21:24 CDT
838854
</div>
839855
</div>
840856
</body>

workflows/common/python/runner_utils.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,21 @@
11
import numpy as np
22
import json, os
33

4+
try:
5+
basestring
6+
except NameError:
7+
basestring = str
8+
49
DATA_TYPES = {type(np.float16): 'f16', type(np.float32): 'f32', type(np.float64): 'f64'}
510

611
def write_output(result, instance_directory):
712
with open('{}/result.txt'.format(instance_directory), 'w') as f_out:
813
f_out.write("{}\n".format(result))
914

10-
def init(param_file, instance_directory, framework, out_dir_key):
11-
with open(param_file) as f_in:
12-
hyper_parameter_map = json.load(f_in)
15+
def init(param_string, instance_directory, framework, out_dir_key):
16+
#with open(param_file) as f_in:
17+
# hyper_parameter_map = json.load(f_in)
18+
hyper_parameter_map = json.loads(param_string.strip())
1319

1420
if not os.path.exists(instance_directory):
1521
os.makedirs(instance_directory)

workflows/nt3_mlrMBO/README.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The NT3 mlrMBO workflow evaluates the NT3 benchmark
44
using hyperparameters provided by a mlrMBO instance. mlrMBO
5-
minimizes the TODO. Swift is used to scalably distribute
5+
minimizes the validation loss. Swift is used to scalably distribute
66
work across the system, and EMEWS is used to:
77

88
1. Pass the hyperparameters to evaluate from the running mlrMBO algorithm to
@@ -95,7 +95,11 @@ nt3_mlrMBO/
9595
* `swift/workflow.sh` - generic launch script to set the appropriate enviroment variables etc. and then launch the swift workflow script
9696
* `swift/cori_workflow3.sh` - launch script customized for the Cori supercomputer
9797
* `swift/cori_settings.sh` - settings for running on the Cori supercomputer
98-
* `swift/ai_workflow3.swift` - app invocation ("ai") version (see below) of the swift workflow
98+
* `swift/ai_workflow.sh` - launch script for running the app invocation ("ai") workflow (see below).
99+
* `swift/ai_workflow3.swift` - app invocation version (see below) of the swift workflow
100+
* `swift/theta_workflow.sh` - launch script for running on theta. This uses the app invocation workflow.
101+
* `scripts/theta_run_model.sh` - theta-specific bash script used to launch nt3_runner.py
102+
* `scripts/run_model.sh` - generic bash script used to to launch nt3_runner.py
99103

100104
## Running the Workflow ##
101105

@@ -104,7 +108,9 @@ There are two different versions of the workflow.
104108
1. The first runs the benchmark code directly from within swift using swift's
105109
python integration.
106110
2. The second, the _ai_-version, runs the benchmark code by invoking the python interpreter using
107-
a bash script which is in turn invoked using a swift app function.
111+
a bash script which is in turn invoked using a swift app function. The bash scripts
112+
`scripts/theta_run_model.sh` and `scripts/run_model.sh` are an example of the
113+
bash script.
108114

109115
The latter of these is necessary on machines like Theta where it is not possible
110116
to compile swift with an appropriate python.
@@ -139,8 +145,12 @@ the workflow is run, by defining which swift is actually run.
139145
* Set to `$EMEWS_PROJECT_ROOT\swift\workflow3.swift` to run the benchmarks via swift's integrated python.
140146
* Set to `$EMEWS_PROJECT_ROOT\swift\ai_workflow3.swift` to run the benchmarks via a swift
141147
app function.
142-
* `SCRIPT_FILE` - the bash script used to run benchmark when the benchmark is
143-
run via a swift app function.
148+
149+
If you need to run the _ai_-version of the workflow, there is an addtional shell
150+
variable to set:
151+
152+
* `SCRIPT_FILE` - the path to the bash script that is used to launch the python
153+
benchmark runner code (e.g. `scripts/run_model.sh`).
144154

145155
If running on an HPC machine, set `PROCS`, `PPN`, `QUEUE`, `WALLTIME` and `MACHINE`
146156
as appropriate.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import sys
2+
import exp_logger
3+
4+
def log_start():
5+
parameter_map = {}
6+
parameter_map['pp'] = sys.argv[2]
7+
parameter_map['iterations'] = sys.argv[3]
8+
parameter_map['params'] = "\"\"\"{}\"\"\"".format(sys.argv[4])
9+
parameter_map['algorithm'] = sys.argv[5]
10+
parameter_map['experiment_id'] = sys.argv[6]
11+
sys_env = "\"\"\"{}\"\"\"".format(sys.argv[7])
12+
13+
exp_logger.start(parameter_map, sys_env)
14+
15+
def log_end():
16+
exp_id = sys.argv[2]
17+
exp_logger.end(exp_id)
18+
19+
def main():
20+
print(sys.argv)
21+
if sys.argv[1] == 'start':
22+
log_start()
23+
else:
24+
log_end()
25+
26+
if __name__ == '__main__':
27+
main()

workflows/nt3_mlrMBO/python/nt3_tc1_runner.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,13 +55,16 @@ def run(hyper_parameter_map):
5555
return val_loss[-1]
5656

5757
if __name__ == '__main__':
58-
param_file = sys.argv[1]
58+
param_string = sys.argv[1]
5959
instance_directory = sys.argv[2]
6060
model_name = sys.argv[3]
6161
framework = sys.argv[4]
62-
hyper_parameter_map = runner_utils.init(param_file, instance_directory,
63-
framework, 'save')
62+
exp_id = sys.argv[5]
63+
run_id = sys.argv[6]
64+
hyper_parameter_map = runner_utils.init(param_string, instance_directory, framework, 'save')
6465
hyper_parameter_map['model_name'] = model_name
66+
hyper_parameter_map['experiment_id'] = exp_id
67+
hyper_parameter_map['run_id'] = run_id
6568
# clear sys.argv so that argparse doesn't object
6669
sys.argv = ['nt3_tc1_runner']
6770
result = run(hyper_parameter_map)

workflows/nt3_mlrMBO/python/test/run_test_app_invoke.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@
22

33
NT3_DIR=../../../../../Benchmarks/Pilot1/NT3
44
TC1_DIR=../../../../../Benchmarks/Pilot1/TC1
5+
COMMON_DIR="../../../common/python"
56

6-
export PYTHONPATH="$PWD/..:$NT3_DIR:$TC1_DIR"
7+
PARAM_STRING="$(<./params.json)"
78

8-
python ../nt3_tc1_runner.py ./params.json ./ nt3
9+
export PYTHONPATH="$PWD/..:$NT3_DIR:$TC1_DIR:$COMMON_DIR"
10+
11+
python ../nt3_tc1_runner.py "$PARAM_STRING" ./ nt3 keras foo bar
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
set -eu
2+
3+
CMD=$1
4+
EMEWS_PROJECT_ROOT=$2
5+
6+
COMMON_DIR=$EMEWS_PROJECT_ROOT/../../../Benchmarks/common
7+
export PYTHONPATH="$COMMON_DIR"
8+
9+
# "start" propose_points, max_iterations, ps, algorithm, exp_id, sys_env
10+
if [ $CMD == "start" ]
11+
then
12+
arg_array=("$EMEWS_PROJECT_ROOT/python/log_runner.py" "$1" "$3" "$4" "$5" "$6" "$7" "$8")
13+
python "${arg_array[@]}"
14+
else
15+
arg_array=("$EMEWS_PROJECT_ROOT/python/log_runner.py" "$1" "$3")
16+
python "${arg_array[@]}"
17+
fi

workflows/nt3_mlrMBO/scripts/run_model.sh

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ set -eu
1313

1414
# !!! IF YOU CHANGE THE NUMBER OF ARGUMENTS PASSED TO THIS SCRIPT, YOU MUST
1515
# CHANGE THE TIMEOUT_ARG_INDEX !!!
16-
TIMEOUT_ARG_INDEX=6
16+
TIMEOUT_ARG_INDEX=8
1717
TIMEOUT=""
1818
if [[ $# == $TIMEOUT_ARG_INDEX ]]
1919
then
@@ -27,7 +27,8 @@ fi
2727

2828
# Set param_line from the first argument to this script
2929
# param_line is the string containing the model parameters for a run.
30-
param_file=$1
30+
parameter_string="$1"
31+
echo $parameter_string
3132

3233
# Set emews_root to the root directory of the project (i.e. the directory
3334
# that contains the scripts, swift, etc. directories and files)
@@ -40,18 +41,21 @@ cd $instance_directory
4041

4142
model_name=$4
4243
framework=$5
44+
exp_id=$6
45+
run_id=$7
4346

44-
BENCHMARK_DIR=$emews_root/../../../Benchmarks/Pilot1/NT3:$emews_root/../../../Benchmarks/Pilot1/TC1
47+
BENCHMARK_DIR=$emews_root/../../../Benchmarks/common:$emews_root/../../../Benchmarks/Pilot1/NT3:$emews_root/../../../Benchmarks/Pilot1/TC1
4548
COMMON_DIR=$emews_root/../common/python
4649
export PYTHONPATH="$PYTHONPATH:$BENCHMARK_DIR:$COMMON_DIR"
47-
MODEL_CMD="python $emews_root/python/nt3_tc1_runner.py $param_file $instance_directory $model_name $framework"
4850

51+
arg_array=("$emews_root/python/nt3_tc1_runner.py" "$parameter_string" "$instance_directory" "$model_name" "$framework" "$exp_id" "$run_id")
52+
MODEL_CMD="python ${arg_array[@]}"
4953
# Turn bash error checking off. This is
5054
# required to properly handle the model execution return value
5155
# the optional timeout.
5256
set +e
5357
echo $MODEL_CMD
54-
$TIMEOUT_CMD $MODEL_CMD
58+
$TIMEOUT_CMD python "${arg_array[@]}"
5559
# $? is the exit status of the most recently executed command (i.e the
5660
# line above)
5761
RES=$?
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
set -eu
2+
3+
CMD=$1
4+
EMEWS_PROJECT_ROOT=$2
5+
6+
export PYTHONHOME="/home/brettin/anaconda2/envs/vrane"
7+
PYTHON="$PYTHONHOME/bin/python"
8+
export LD_LIBRARY_PATH="$PYTHONHOME/lib"
9+
export PATH="$PYTHONHOME/bin:$PATH"
10+
11+
COMMON=$emews_root/../../../Benchmarks/common
12+
PYTHONPATH="$PYTHONHOME/lib/python2.7:$COMMON"
13+
PYTHONPATH+="$PYTHONHOME/lib/python2.7/site-packages"
14+
export PYTHONPATH
15+
16+
# "start" propose_points, max_iterations, ps, algorithm, exp_id, sys_env
17+
if [ $CMD == "start" ]
18+
then
19+
arg_array=("$EMEWS_PROJECT_ROOT/python/log_runner.py" "$1" "$3" "$4" "$5" "$6" "$7" "$8")
20+
python "${arg_array[@]}"
21+
else
22+
arg_array=("$EMEWS_PROJECT_ROOT/python/log_runner.py" "$1" "$3")
23+
python "${arg_array[@]}"
24+
fi

0 commit comments

Comments
 (0)