|
| 1 | +# Manual Metric Collection and Training with Entrypoint |
| 2 | + |
| 3 | +## 1. Collect metrics |
| 4 | +Without benchmark/pipeline automation, kepler metrics can be collected by `query` function by either one of the following options. |
| 5 | +### 1.1. by defining start time and end time |
| 6 | + |
| 7 | +```bash |
| 8 | +# value setting |
| 9 | +BENCHMARK= # name of the benchmark (will generate [BENCHMARK].json to save start and end time for reference) |
| 10 | +PROM_URL= # e.g., http://localhost:9090 |
| 11 | +START_TIME= # format date +%Y-%m-%dT%H:%M:%SZ |
| 12 | +END_TIME= # format date +%Y-%m-%dT%H:%M:%SZ |
| 13 | +COLLECT_ID= # any unique id e.g., machine name |
| 14 | + |
| 15 | +# query execution |
| 16 | +DATAPATH=/path/to/workspace python cmd/main.py query --benchmark $BENCHMARK --server $PROM_URL --output kepler_query --start-time $START_TIME --end-time $END_TIME --id $COLLECT_ID |
| 17 | +``` |
| 18 | + |
| 19 | +### 1.2. by defining last interval from the execution time |
| 20 | + |
| 21 | +```bash |
| 22 | +# value setting |
| 23 | +BENCHMARK= # name of the benchmark (will generate [BENCHMARK].json to save start and end time for reference) |
| 24 | +PROM_URL= # e.g., http://localhost:9090 |
| 25 | +INTERVAL= # in second |
| 26 | +COLLECT_ID= # any unique id e.g., machine name |
| 27 | + |
| 28 | +# query execution |
| 29 | +DATAPATH=/path/to/workspace python cmd/main.py query --benchmark $BENCHMARK --server $PROM_URL --output kepler_query --interval $INTERVAL --id $COLLECT_ID |
| 30 | +``` |
| 31 | + |
| 32 | +### Output: |
| 33 | +There will three files created in the `/path/to/workspace`, those are: |
| 34 | +- `kepler_query.json`: raw prometheus query response |
| 35 | +- `<COLLECT_ID>.json`: machine system features (spec) |
| 36 | +- `<BENCHMARK>.json`: an item contains startTimeUTC and endTimeUTC |
| 37 | + |
| 38 | +## 2. Train models |
| 39 | + |
| 40 | +```bash |
| 41 | +# value setting |
| 42 | +PIPELINE_NAME= # any unique name for the pipeline (one pipeline can be accumulated by multiple COLLECT_ID) |
| 43 | + |
| 44 | +# train execution |
| 45 | +# require COLLECT_ID from collect step |
| 46 | +DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py train --pipeline-name $PIPELINE_NAME --input kepler_query --id $COLLECT_ID |
| 47 | +``` |
| 48 | + |
| 49 | +## 3. Export models |
| 50 | +Export function is to archive the model that has an error less than threshold from the trained pipeline and make a report in the format that is ready to push to kepler-model-db. |
| 51 | + |
| 52 | +### 3.1. exporting the trained pipeline with BENCHMARK |
| 53 | + |
| 54 | +The benchmark file is created by CPE operator or by step 1.1. or 1.2.. |
| 55 | + |
| 56 | +```bash |
| 57 | +# value setting |
| 58 | +EXPORT_PATH= # /path/to/kepler-model-db/models |
| 59 | +PUBLISHER= # github account of publisher |
| 60 | + |
| 61 | +# export execution |
| 62 | +# require BENCHMARK from collect step |
| 63 | +# require PIPELINE_NAME from train step |
| 64 | +DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py export --benchmark $BENCHMARK --pipeline-name $PIPELINE_NAME -o $EXPORT_PATH --publisher $PUBLISHER --zip=true |
| 65 | +``` |
| 66 | + |
| 67 | +### 3.2. exporting the trained models without BENCHMARK |
| 68 | + |
| 69 | +If the data is collected by tekton, there is no benchmark file created. Need to manually set `--collect-date` instead of `--benchmark` parameter. |
| 70 | + |
| 71 | +```bash |
| 72 | +# value setting |
| 73 | +EXPORT_PATH= # /path/to/kepler-model-db/models |
| 74 | +PUBLISHER= # github account of publisher |
| 75 | +COLLECT_DATE= # collect date |
| 76 | + |
| 77 | +# export execution |
| 78 | +# require BENCHMARK from collect step |
| 79 | +# require PIPELINE_NAME from train step |
| 80 | +DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py export --pipeline-name $PIPELINE_NAME -o $EXPORT_PATH --publisher $PUBLISHER --zip=true --collect-date $COLLECT_DATE |
| 81 | +``` |
| 82 | + |
0 commit comments