Skip to content

Commit d1abb4a

Browse files
committed
Dfaas readme rewritten
1 parent 488d7bc commit d1abb4a

File tree

1 file changed

+246
-103
lines changed

1 file changed

+246
-103
lines changed

lb_plugins/plugins/dfaas/README.md

Lines changed: 246 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,185 @@
11
# DFaaS Plugin
22

3-
This plugin implements the legacy DFaaS sampling workflow using k6 on a
4-
dedicated host and OpenFaaS/Prometheus on the target.
5-
6-
## Architecture
7-
- Target host: runner + k3s + OpenFaaS + Prometheus + exporters.
8-
- k6 host: executes load, controlled via SSH from the target host.
9-
- Network paths:
10-
- target -> k6-host over SSH (port 22 by default)
11-
- k6-host -> OpenFaaS gateway (port 31112 by default)
12-
- target -> Prometheus (NodePort 30411 by default)
13-
14-
## Prerequisites
15-
- SSH access from target host to k6 host using the configured key.
16-
- Target host packages: `kubectl`, `helm`, `faas-cli`, `ansible-playbook`.
17-
- Open ports: 22 (SSH), 31112 (OpenFaaS gateway), 30411 (Prometheus).
18-
19-
## Setup steps
20-
1. Install k6 on the k6 host:
21-
- Run `lb_plugins/plugins/dfaas/ansible/setup_k6.yml` against the k6 host.
22-
2. Install k3s/OpenFaaS/Prometheus on the target host:
23-
- Run `lb_plugins/plugins/dfaas/ansible/setup_target.yml` against the target.
24-
3. Verify:
25-
- `kubectl get nodes` shows Ready.
26-
- `faas-cli list --gateway http://<target>:31112` lists functions.
27-
- `curl http://<target>:30411/-/ready` returns 200.
28-
29-
## Run flow
30-
- Generate all function combinations and rate vectors.
31-
- For each config:
32-
- Enforce cooldown (idle CPU/RAM/POWER and replicas < 2).
33-
- Generate a k6 script and run it on the k6 host.
34-
- Parse the k6 summary and query Prometheus.
35-
- Compute overload and skip dominant configs when needed.
36-
37-
## Outputs
38-
In the workload output directory:
39-
- `results.csv`, `skipped.csv`, `index.csv` (legacy-compatible headers).
40-
- `summaries/summary-<config>-iter<iter>-rep<rep>.json`
41-
- `metrics/metrics-<config>-iter<iter>-rep<rep>.csv`
42-
- `k6_scripts/config-<config>.js`
43-
44-
## Config loading
45-
- `config_path` can be passed via workload options to load a YAML/JSON file that
46-
contains `common` and `plugins.dfaas` sections.
47-
- `plugins.dfaas` overrides `common`, and any options passed alongside
48-
`config_path` override both.
49-
50-
## Config schema
51-
Top-level fields (in `plugins.dfaas`):
52-
- `k6_host` (str): k6 host address.
53-
- `k6_user` (str): SSH user for the k6 host.
54-
- `k6_ssh_key` (str): SSH private key path.
55-
- `k6_port` (int): SSH port.
56-
- `k6_workspace_root` (str): workspace root on the k6 host.
57-
- `output_dir` (str): optional output directory override for DFaaS artifacts.
58-
- `run_id` (str): optional run identifier used for k6 workspace.
59-
- `gateway_url` (str): OpenFaaS gateway URL.
60-
- `prometheus_url` (str): Prometheus base URL (default NodePort 30411).
61-
- `functions` (list): list of function objects (name/method/body/headers/max_rate).
62-
- `rates` (object): `min_rate`, `max_rate`, `step`.
63-
- `combinations` (object): `min_functions`, `max_functions` (max is exclusive).
64-
- `duration` (str): k6 duration string (e.g. `30s`).
65-
- `iterations` (int): iterations per configuration.
66-
- `cooldown` (object): `max_wait_seconds`, `sleep_step_seconds`,
67-
`idle_threshold_pct`.
68-
- `overload` (object): `cpu_overload_pct_of_capacity`, `ram_overload_pct`,
69-
`success_rate_node_min`, `success_rate_function_min`,
70-
`replicas_overload_threshold`.
71-
- `queries_path` (str): path to the Prometheus queries file.
72-
- `deploy_functions` (bool): deploy OpenFaaS store functions.
73-
- `scaphandre_enabled` (bool): enable power metrics via Scaphandre.
74-
- `function_pid_regexes` (map): optional PID regex per function when Scaphandre is enabled.
75-
76-
Common base fields:
77-
- `max_retries` (int): retries for the workload (default 0).
78-
- `timeout_buffer` (int): safety buffer added to expected runtime (default 10).
79-
- `tags` (list[str]): workload tags.
80-
81-
Function object fields:
82-
- `name` (str): OpenFaaS function name.
83-
- `method` (str): HTTP method (GET/POST/etc).
84-
- `body` (str): request payload (match legacy payloads in
85-
`legacy_materials/samples_generator/utils.py`).
86-
- `headers` (map): HTTP headers.
87-
- `max_rate` (int, optional): per-function maximum rate (requests/sec).
3+
The DFaaS plugin reproduces the legacy sampling workflow using:
4+
- OpenFaaS functions as the target workload.
5+
- k6 for load generation.
6+
- Prometheus + exporters for metrics.
887

89-
## Example config (YAML)
8+
It runs one configuration at a time, applies cooldown and overload rules, and
9+
persists legacy-compatible CSV outputs.
10+
11+
## Components and data flow
12+
Control plane (where the runner executes):
13+
- Runs the DFaaS generator locally.
14+
- Invokes Ansible to provision the target and k6 hosts.
15+
- Pulls k6 summaries via Ansible fetch.
16+
- Queries Prometheus over HTTP.
17+
18+
Target host:
19+
- Runs k3s + OpenFaaS + Prometheus + node-exporter + cAdvisor.
20+
- Exposes the OpenFaaS gateway (NodePort 31112) and Prometheus (NodePort 30411).
21+
22+
k6 host:
23+
- Receives k6 scripts via Ansible.
24+
- Runs k6 and exports a summary.json file.
25+
26+
End-to-end flow:
27+
1. Generate function/rate configurations.
28+
2. Cooldown until node is idle and replicas are low.
29+
3. Run k6 for the configuration.
30+
4. Query Prometheus for node and function metrics.
31+
5. Apply overload and dominance rules.
32+
6. Emit CSVs and per-config artifacts.
33+
34+
## Repository layout
35+
- `plugin.py`: config schema and CSV export.
36+
- `generator.py`: config generation, k6 orchestration, Prometheus queries.
37+
- `queries.yml`: PromQL queries.
38+
- `ansible/`: setup and run playbooks.
39+
- `setup_target.yml` installs k3s/OpenFaaS/Prometheus stack.
40+
- `setup_k6.yml` installs k6 and prepares workspace.
41+
- `run_k6.yml` runs a single config on the k6 host.
42+
- `legacy_materials/`: reference artifacts from the legacy workflow.
43+
44+
## Network and ports
45+
Required connectivity:
46+
- Controller -> target: SSH (for Ansible), HTTP to Prometheus.
47+
- Controller -> k6 host: SSH (for Ansible).
48+
- k6 host -> OpenFaaS gateway: HTTP.
49+
50+
Default ports:
51+
- OpenFaaS gateway: 31112 (NodePort).
52+
- Prometheus: 30411 (NodePort).
53+
54+
If NodePorts are not reachable from the controller, use SSH tunneling.
55+
56+
## Setup (manual or via controller)
57+
58+
### k6 host
59+
Playbook: `lb_plugins/plugins/dfaas/ansible/setup_k6.yml`
60+
61+
Example:
62+
```bash
63+
ansible-playbook -i k6_inventory.ini lb_plugins/plugins/dfaas/ansible/setup_k6.yml
9064
```
65+
66+
Key variables:
67+
- `k6_workspace_root` (default `/var/lib/dfaas-k6`).
68+
- `k6_version` (default `0.49.0`).
69+
70+
Verification:
71+
```bash
72+
ssh <k6-host> k6 version
73+
```
74+
75+
### target host (k3s + OpenFaaS + Prometheus)
76+
Playbook: `lb_plugins/plugins/dfaas/ansible/setup_target.yml`
77+
78+
Example:
79+
```bash
80+
ansible-playbook -i target_inventory.ini \
81+
-e '{"openfaas_functions":["figlet","env"]}' \
82+
lb_plugins/plugins/dfaas/ansible/setup_target.yml
83+
```
84+
85+
Notes:
86+
- OpenFaaS is installed via Helm.
87+
- OpenFaaS built-in Prometheus/Alertmanager are disabled; a dedicated Prometheus
88+
deployment is applied from `legacy_materials/infrastructure/` plus custom
89+
manifests.
90+
91+
Key variables:
92+
- `openfaas_gateway_node_port` (default 31112).
93+
- `openfaas_functions` (list of store functions to deploy).
94+
- `prometheus_node_port` (default 30411).
95+
- `scaphandre_enabled` + `scaphandre_repo_url` + `scaphandre_chart` for power metrics.
96+
97+
Verification:
98+
```bash
99+
kubectl get nodes
100+
faas-cli list --gateway http://<target-ip>:31112
101+
curl http://<target-ip>:30411/-/ready
102+
```
103+
104+
## Running the workload
105+
The plugin can be run via the controller or directly via a BenchmarkConfig.
106+
107+
Configuration is typically supplied via:
108+
- `config_path`: YAML/JSON file with `common` and `plugins.dfaas`.
109+
- or `options` in `user_defined` mode.
110+
111+
Config precedence:
112+
1. `common` in the file.
113+
2. `plugins.dfaas` in the file.
114+
3. Options passed alongside `config_path` (highest priority).
115+
116+
## Configuration reference
117+
All fields live under `plugins.dfaas` unless noted.
118+
119+
### Core
120+
- `config_path` (Path, optional): YAML/JSON file with `common` + `plugins.dfaas`.
121+
- `output_dir` (Path, optional): override output directory for artifacts.
122+
- `run_id` (str, optional): identifier for the k6 workspace path.
123+
124+
### k6 host
125+
- `k6_host` (str, default `127.0.0.1`): k6 host address.
126+
- `k6_user` (str, default `ubuntu`): SSH user for k6 host.
127+
- `k6_ssh_key` (str, default `~/.ssh/id_rsa`): SSH private key.
128+
- `k6_port` (int, default 22): SSH port.
129+
- `k6_workspace_root` (str, default `/var/lib/dfaas-k6`): workspace root on k6 host.
130+
131+
### OpenFaaS and Prometheus
132+
- `gateway_url` (str, default `http://127.0.0.1:31112`): OpenFaaS gateway URL.
133+
- `prometheus_url` (str, default `http://127.0.0.1:30411`): Prometheus base URL.
134+
135+
### Functions
136+
List of function objects:
137+
- `name` (str, required): OpenFaaS function name.
138+
- `method` (str, default `GET`): HTTP method.
139+
- `body` (str, default empty): request body.
140+
- `headers` (map, default empty): HTTP headers.
141+
- `max_rate` (int, optional): per-function max rate; caps global rates.
142+
143+
Validation:
144+
- Function names must be unique.
145+
- If `max_rate` is set, it must be >= `rates.min_rate`.
146+
147+
### Rates
148+
- `rates.min_rate` (int, default 0): inclusive min requests/sec.
149+
- `rates.max_rate` (int, default 200): inclusive max requests/sec.
150+
- `rates.step` (int, default 10): step size.
151+
152+
### Combinations
153+
- `combinations.min_functions` (int, default 1): minimum functions per config.
154+
- `combinations.max_functions` (int, default 2): exclusive upper bound.
155+
156+
### Timing
157+
- `duration` (str, default `30s`): k6 duration.
158+
- `iterations` (int, default 3): iterations per config.
159+
160+
### Cooldown
161+
- `cooldown.max_wait_seconds` (int, default 180).
162+
- `cooldown.sleep_step_seconds` (int, default 5).
163+
- `cooldown.idle_threshold_pct` (float, default 15).
164+
165+
### Overload thresholds
166+
- `overload.cpu_overload_pct_of_capacity` (float, default 80).
167+
- `overload.ram_overload_pct` (float, default 90).
168+
- `overload.success_rate_node_min` (float, default 0.95).
169+
- `overload.success_rate_function_min` (float, default 0.90).
170+
- `overload.replicas_overload_threshold` (int, default 15).
171+
172+
### Metrics and queries
173+
- `queries_path` (str, default `lb_plugins/plugins/dfaas/queries.yml`).
174+
- `scaphandre_enabled` (bool, default false).
175+
- `function_pid_regexes` (map, default empty): PID regex per function for power.
176+
177+
### Deployment hints
178+
- `deploy_functions` (bool, default true): informational flag.
179+
The setup playbook actually uses the `openfaas_functions` extra var.
180+
181+
## Example config (YAML)
182+
```yaml
91183
common:
92184
timeout_buffer: 10
93185

@@ -108,7 +200,7 @@ plugins:
108200
headers:
109201
Content-Type: "text/plain"
110202
max_rate: 100
111-
- name: "eat-memory"
203+
- name: "env"
112204
method: "GET"
113205
body: ""
114206

@@ -141,21 +233,72 @@ plugins:
141233
scaphandre_enabled: false
142234
```
143235
144-
## Formal rules (legacy)
145-
- Rate list: `rates = [min_rate..max_rate step]` inclusive, ascending.
146-
- Combinations: function sets from `min_functions..max_functions` (max exclusive).
147-
- Dominance: Config B dominates A if for every function `rate_B >= rate_A` and
148-
for at least one function `rate_B > rate_A`. If A is overloaded, skip all
149-
dominant configs.
150-
- Cooldown: wait until CPU/RAM/POWER <= idle + idle * 15% and replicas < 2, with
151-
a max wait of 180s.
152-
- Overload:
153-
- Node overloaded if avg success rate < 0.95 OR CPU > 80% capacity OR
154-
RAM > 90% OR any function overload.
155-
- Function overloaded if success rate < 0.90 OR replicas >= 15.
236+
## Configuration generation logic
237+
1. Build a global rate list from `min_rate..max_rate` inclusive.
238+
2. Apply `functions[].max_rate` (if set) to cap rates per function.
239+
3. Generate function combinations from `min_functions` to `max_functions` (exclusive).
240+
4. Produce all rate permutations for each combination.
241+
242+
Dominance:
243+
- Config B dominates A when it has the same function set and all rates are
244+
greater or equal, with at least one strictly greater.
245+
- If A is overloaded, all dominant configs are skipped.
246+
247+
Cooldown:
248+
- Wait until CPU/RAM/POWER are within `idle_threshold_pct` of idle values and
249+
replicas < 2, or time out at `max_wait_seconds`.
250+
251+
Overload:
252+
- Node overloaded if average success rate < threshold, or CPU/RAM exceed limits,
253+
or any function is overloaded.
254+
- Function overloaded if success rate < threshold or replica count is high.
255+
256+
## Metrics collected
257+
Prometheus queries are defined in `queries.yml` and include:
258+
- Node CPU usage (from node-exporter).
259+
- Node RAM usage (from node-exporter).
260+
- Function CPU and RAM usage (from cAdvisor).
261+
- Power metrics if Scaphandre is enabled.
262+
263+
If a query fails, the metric is recorded as `nan`.
264+
265+
## Outputs and artifact layout
266+
The generator emits results into the DFaaS output directory, resolved as:
267+
- `output_dir` if set in the config.
268+
- otherwise `<benchmark_results>/<workload_name>`, derived from the runner.
269+
270+
Artifacts:
271+
- `results.csv`: one row per config iteration.
272+
- `skipped.csv`: configs that were skipped (dominance or already executed).
273+
- `index.csv`: unique configurations for resume support.
274+
- `summaries/summary-<config>-iter<iter>-rep<rep>.json`: k6 summary output.
275+
- `metrics/metrics-<config>-iter<iter>-rep<rep>.csv`: metrics snapshot.
276+
- `k6_scripts/config-<config>.js`: generated k6 scripts.
277+
278+
Column conventions in `results.csv`:
279+
- Per-function columns: `function_<name>`, `rate_function_<name>`,
280+
`success_rate_function_<name>`, `cpu_usage_function_<name>`,
281+
`ram_usage_function_<name>`, `power_usage_function_<name>`,
282+
`replica_<name>`, `overloaded_function_<name>`, `medium_latency_function_<name>`.
283+
- Node columns: `cpu_usage_idle_node`, `cpu_usage_node`, `ram_usage_idle_node`,
284+
`ram_usage_node`, `ram_usage_idle_node_percentage`, `ram_usage_node_percentage`,
285+
`power_usage_idle_node`, `power_usage_node`, `rest_seconds`, `overloaded_node`.
156286

157287
## Troubleshooting
158-
- OpenFaaS gateway unreachable: confirm NodePort 31112 and `faas-cli login`.
159-
- Prometheus query timeouts: verify `prometheus_url` and NodePort 30411.
160-
- k6 SSH failures: confirm `k6_host`, `k6_user`, and `k6_ssh_key` are correct.
161-
- Cooldown never completes: check for stuck replicas or sustained CPU/RAM load.
288+
- OpenFaaS gateway unreachable: verify NodePort 31112 and `faas-cli login`.
289+
- Prometheus timeouts: verify NodePort 30411 and that node-exporter/cAdvisor pods
290+
are running.
291+
- k6 SSH errors: verify `k6_host`, user, key, and network access from controller.
292+
- Cooldown never finishes: check replicas or sustained CPU/RAM load.
293+
- Missing metrics: confirm exporters are running and that Prometheus targets are up.
294+
295+
## Testing
296+
- Unit tests: `tests/unit/lb_plugins/test_dfaas_*`.
297+
- Docker integration: `tests/integration/lb_plugins/test_dfaas_docker_integration.py`.
298+
- Multipass e2e: `tests/e2e/test_dfaas_multipass_e2e.py` (creates two VMs).
299+
300+
## Extending the plugin
301+
- Add functions by updating `functions` and (optionally) `openfaas_functions`.
302+
- Cap per-function rate with `functions[].max_rate`.
303+
- Add new rate generation strategies by extending
304+
`generate_configurations` and adding new config fields.

0 commit comments

Comments
 (0)