You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+31-19Lines changed: 31 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,14 @@
1
1
# Kepler Power Model
2
+
2
3
[Get started with Kepler Model Server.](https://sustainable-computing.io/kepler_model_server/get_started/)
3
4
4
5
This repository contains source code related to Kepler power model. The modules in this repository connects to [core Kepler project](https://github.com/sustainable-computing-io/kepler) and [kepler-model-db](https://github.com/sustainable-computing-io/kepler-model-db) as below.
5
-

6
+
7
+

8
+
6
9
For more details, check [the component diagram](./fig/model-server-components-simplified.png).
`Benchmark` CR has a dependency on `BenchmarkOperator`. Default `BechmarkOperator` is to support [batch/v1/Job API](https://github.com/IBM/cpe-operator/blob/main/examples/none/cpe_v1_none_operator.yaml).
43
52
44
53
### Tekton
54
+
45
55
Create workload `Task` and provide example `Pipeline` to run.
46
56
47
57
### Add new trained models
58
+
48
59
TBD
49
60
50
61
## Source improvement
62
+
51
63
Any improvement in `src` and `cmd`.
52
64
53
65
## Test and CI improvement
66
+
54
67
Any improvement in `tests`, `dockerfiles`, `manifests` and `.github/workflows`
55
68
56
69
## Documentation
57
70
58
-
Detailed documentation should be posted to [kepler-doc](https://github.com/sustainable-computing-io/kepler-doc) repository.
71
+
Detailed documentation should be posted to [kepler-doc](https://github.com/sustainable-computing-io/kepler-doc) repository.
Copy file name to clipboardExpand all lines: model_training/README.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
# Contribute to power profiling and model training
2
2
3
3
<!--toc:start-->
4
+
4
5
-[Contribute to power profiling and model training](#contribute-to-power-profiling-and-model-training)
5
6
-[Requirements](#requirements)
6
7
-[Pre-step](#pre-step)
@@ -10,8 +11,8 @@
10
11
-[For managed cluster](#for-managed-cluster)
11
12
-[Run benchmark and collect metrics](#run-benchmark-and-collect-metrics)
12
13
-[With manual execution](#with-manual-execution)
13
-
-[[Manual Metric Collection and Training with Entrypoint](./cmd_instruction.md)](#manual-metric-collection-and-training-with-entrypointcmdinstructionmd)
14
14
-[Clean up](#clean-up)
15
+
15
16
<!--toc:end-->
16
17
17
18
## Requirements
@@ -68,7 +69,7 @@ There are two options to run the benchmark and collect the metrics, [CPE-operato
In addition to the above two automation approach, you can manually run your own benchmarks, then collect, train, and export the models by the entrypoint `cmd/main.py`
Copy file name to clipboardExpand all lines: model_training/cmd_instruction.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
# Manual Metric Collection and Training with Entrypoint
2
2
3
3
## 1. Collect metrics
4
+
4
5
Without benchmark/pipeline automation, kepler metrics can be collected by `query` function by setting `BENCHMARK`, `PROM_URL`, `COLLECT_ID` and either one of the following time options.
5
6
6
7
> It is recommend to set BENCHMARK name as a part of the pod name such as `stressng` to filter the validated results. BENCHMARK name will be also used by the TrainerIsolator to filter the target pods. If the BENCHMARK cannot be used to filter the target pods, the validated results will show result from all pods.
There will three files created in the `/path/to/workspace`, those are:
39
+
37
40
-`kepler_query.json`: raw prometheus query response
38
41
-`<COLLECT_ID>.json`: machine system features (spec)
39
42
-`<BENCHMARK>.json`: an item contains startTimeUTC and endTimeUTC
@@ -50,6 +53,7 @@ DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py tra
50
53
```
51
54
52
55
## 3. Export models
56
+
53
57
Export function is to archive the model that has an error less than threshold from the trained pipeline and make a report in the format that is ready to push to kepler-model-db. To use export function, need to set `EXPORTER_PATH` and `PUBLISHER`, and collect date option.
0 commit comments