Skip to content

Commit d76a9dc

Browse files
committed
add manual training with entrypoint instruction
Signed-off-by: Sunyanan Choochotkaew <[email protected]>
1 parent 4bdcc9c commit d76a9dc

File tree

2 files changed

+88
-0
lines changed

2 files changed

+88
-0
lines changed

model_training/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Please confirm the following requirements:
3838
3939
## 2. Run benchmark and collect metrics
4040
41+
### With benchmark automation and pipeline
4142
There are two options to run the benchmark and collect the metrics, [CPE-operator](https://github.com/IBM/cpe-operator) with manual script and [Tekton Pipeline](https://github.com/tektoncd/pipeline).
4243
4344
> The adoption of the CPE operator is slated for deprecation. We are on transitioning to the automation of collection and training processes through the Tekton pipeline. Nevertheless, the CPE operator might still be considered for usage in customized benchmarks requiring performance values per sub-workload within the benchmark suite.
@@ -46,6 +47,11 @@ There are two options to run the benchmark and collect the metrics, [CPE-operato
4647
4748
### [CPE Operator Instruction](./cpe_script_instruction.md)
4849
50+
### With manual execution
51+
In addition to the above two automation approach, you can manually run your own benchmarks, then collect, train, and export the models by the entrypoint `cmd/main.py`
52+
53+
### [Manual Metric Collection and Training with Entrypoint](./cmd_instruction.md)
54+
4955
## Clean up
5056
5157
### For kind-for-training cluster

model_training/cmd_instruction.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Manual Metric Collection and Training with Entrypoint
2+
3+
## 1. Collect metrics
4+
Without benchmark/pipeline automation, kepler metrics can be collected by `query` function by either one of the following options.
5+
### 1.1. by defining start time and end time
6+
7+
```bash
8+
# value setting
9+
BENCHMARK= # name of the benchmark (will generate [BENCHMARK].json to save start and end time for reference)
10+
PROM_URL= # e.g., http://localhost:9090
11+
START_TIME= # format date +%Y-%m-%dT%H:%M:%SZ
12+
END_TIME= # format date +%Y-%m-%dT%H:%M:%SZ
13+
COLLECT_ID= # any unique id e.g., machine name
14+
15+
# query execution
16+
DATAPATH=/path/to/workspace python cmd/main.py query --benchmark $BENCHMARK --server $PROM_URL --output kepler_query --start-time $START_TIME --end-time $END_TIME --id $COLLECT_ID
17+
```
18+
19+
### 1.2. by defining last interval from the execution time
20+
21+
```bash
22+
# value setting
23+
BENCHMARK= # name of the benchmark (will generate [BENCHMARK].json to save start and end time for reference)
24+
PROM_URL= # e.g., http://localhost:9090
25+
INTERVAL= # in second
26+
COLLECT_ID= # any unique id e.g., machine name
27+
28+
# query execution
29+
DATAPATH=/path/to/workspace python cmd/main.py query --benchmark $BENCHMARK --server $PROM_URL --output kepler_query --interval $INTERVAL --id $COLLECT_ID
30+
```
31+
32+
### Output:
33+
There will three files created in the `/path/to/workspace`, those are:
34+
- `kepler_query.json`: raw prometheus query response
35+
- `<COLLECT_ID>.json`: machine system features (spec)
36+
- `<BENCHMARK>.json`: an item contains startTimeUTC and endTimeUTC
37+
38+
## 2. Train models
39+
40+
```bash
41+
# value setting
42+
PIPELINE_NAME= # any unique name for the pipeline (one pipeline can be accumulated by multiple COLLECT_ID)
43+
44+
# train execution
45+
# require COLLECT_ID from collect step
46+
DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py train --pipeline-name $PIPELINE_NAME --input kepler_query --id $COLLECT_ID
47+
```
48+
49+
## 3. Export models
50+
Export function is to archive the model that has an error less than threshold from the trained pipeline and make a report in the format that is ready to push to kepler-model-db.
51+
52+
### 3.1. exporting the trained pipeline with BENCHMARK
53+
54+
The benchmark file is created by CPE operator or by step 1.1. or 1.2..
55+
56+
```bash
57+
# value setting
58+
EXPORT_PATH= # /path/to/kepler-model-db/models
59+
PUBLISHER= # github account of publisher
60+
61+
# export execution
62+
# require BENCHMARK from collect step
63+
# require PIPELINE_NAME from train step
64+
DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py export --benchmark $BENCHMARK --pipeline-name $PIPELINE_NAME -o $EXPORT_PATH --publisher $PUBLISHER --zip=true
65+
```
66+
67+
### 3.2. exporting the trained models without BENCHMARK
68+
69+
If the data is collected by tekton, there is no benchmark file created. Need to manually set `--collect-date` instead of `--benchmark` parameter.
70+
71+
```bash
72+
# value setting
73+
EXPORT_PATH= # /path/to/kepler-model-db/models
74+
PUBLISHER= # github account of publisher
75+
COLLECT_DATE= # collect date
76+
77+
# export execution
78+
# require BENCHMARK from collect step
79+
# require PIPELINE_NAME from train step
80+
DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py export --pipeline-name $PIPELINE_NAME -o $EXPORT_PATH --publisher $PUBLISHER --zip=true --collect-date $COLLECT_DATE
81+
```
82+

0 commit comments

Comments
 (0)