Merge pull request #312 from sunya-ch/doc

Sunil Thaha · web-flow · commit 57e75efba88f · 2024-07-25T09:49:20.000+10:00
add manual training with entrypoint instruction
diff --git a/model_training/README.md b/model_training/README.md
@@ -38,6 +38,7 @@ Please confirm the following requirements:
 
 ## 2. Run benchmark and collect metrics
 
+### With benchmark automation and pipeline
 There are two options to run the benchmark and collect the metrics, [CPE-operator](https://github.com/IBM/cpe-operator) with manual script and [Tekton Pipeline](https://github.com/tektoncd/pipeline). 
 
 > The adoption of the CPE operator is slated for deprecation. We are on transitioning to the automation of collection and training processes through the Tekton pipeline. Nevertheless, the CPE operator might still be considered for usage in customized benchmarks requiring performance values per sub-workload within the benchmark suite.
@@ -46,6 +47,11 @@ There are two options to run the benchmark and collect the metrics, [CPE-operato
 
 ### [CPE Operator Instruction](./cpe_script_instruction.md)
 
+### With manual execution
+In addition to the above two automation approach, you can manually run your own benchmarks, then collect, train, and export the models by the entrypoint `cmd/main.py`
+
+### [Manual Metric Collection and Training with Entrypoint](./cmd_instruction.md)
+
 ## Clean up
 
 ### For kind-for-training cluster
diff --git a/model_training/cmd_instruction.md b/model_training/cmd_instruction.md
@@ -0,0 +1,82 @@
+# Manual Metric Collection and Training with Entrypoint
+
+## 1. Collect metrics
+Without benchmark/pipeline automation, kepler metrics can be collected by `query` function by either one of the following options.
+### 1.1. by defining start time and end time
+
+```bash
+# value setting
+BENCHMARK= # name of the benchmark (will generate [BENCHMARK].json to save start and end time for reference)
+PROM_URL= # e.g., http://localhost:9090
+START_TIME= # format date +%Y-%m-%dT%H:%M:%SZ
+END_TIME= # format date +%Y-%m-%dT%H:%M:%SZ
+COLLECT_ID= # any unique id e.g., machine name
+
+# query execution
+DATAPATH=/path/to/workspace python cmd/main.py query --benchmark $BENCHMARK --server $PROM_URL --output kepler_query --start-time $START_TIME --end-time $END_TIME --id $COLLECT_ID
+```
+
+### 1.2. by defining last interval from the execution time
+
+```bash
+# value setting
+BENCHMARK= # name of the benchmark (will generate [BENCHMARK].json to save start and end time for reference)
+PROM_URL= # e.g., http://localhost:9090
+INTERVAL= # in second
+COLLECT_ID= # any unique id e.g., machine name
+
+# query execution
+DATAPATH=/path/to/workspace python cmd/main.py query --benchmark $BENCHMARK --server $PROM_URL --output kepler_query --interval $INTERVAL --id $COLLECT_ID
+```
+
+### Output:
+There will three files created in the `/path/to/workspace`, those are:
+- `kepler_query.json`: raw prometheus query response
+- `<COLLECT_ID>.json`: machine system features (spec)
+- `<BENCHMARK>.json`: an item contains startTimeUTC and endTimeUTC
+
+## 2. Train models
+
+```bash
+# value setting
+PIPELINE_NAME= # any unique name for the pipeline (one pipeline can be accumulated by multiple COLLECT_ID)
+
+# train execution
+# require COLLECT_ID from collect step
+DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py train --pipeline-name $PIPELINE_NAME --input kepler_query --id $COLLECT_ID
+```
+
+## 3. Export models
+Export function is to archive the model that has an error less than threshold from the trained pipeline and make a report in the format that is ready to push to kepler-model-db.
+
+### 3.1. exporting the trained pipeline with BENCHMARK
+
+The benchmark file is created by CPE operator or by step 1.1. or 1.2..
+
+```bash
+# value setting
+EXPORT_PATH= # /path/to/kepler-model-db/models
+PUBLISHER= # github account of publisher
+
+# export execution
+# require BENCHMARK from collect step
+# require PIPELINE_NAME from train step
+DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py export --benchmark $BENCHMARK --pipeline-name $PIPELINE_NAME -o $EXPORT_PATH --publisher $PUBLISHER --zip=true 
+```
+
+### 3.2. exporting the trained models without BENCHMARK
+
+If the data is collected by tekton, there is no benchmark file created. Need to manually set `--collect-date` instead of `--benchmark` parameter.
+
+```bash
+# value setting
+EXPORT_PATH= # /path/to/kepler-model-db/models
+PUBLISHER= # github account of publisher
+COLLECT_DATE= # collect date
+
+# export execution
+# require BENCHMARK from collect step
+# require PIPELINE_NAME from train step
+DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py export --pipeline-name $PIPELINE_NAME -o $EXPORT_PATH --publisher $PUBLISHER --zip=true --collect-date $COLLECT_DATE
+```
+