Skip to content

Commit 9939a59

Browse files
committed
Addressing comments for the mapper tool.
Signed-off-by: L Lakshmanan <s2760012@ed.ac.uk>
1 parent 3a444f3 commit 9939a59

File tree

10 files changed

+131
-145
lines changed

10 files changed

+131
-145
lines changed

.github/workflows/tools-tests.yaml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,47 @@ jobs:
3535
- name: Run tests
3636
working-directory: ${{ matrix.module }}
3737
run: go test -cover -race
38+
39+
mapper_e2e:
40+
name: Mapper E2E Test
41+
runs-on: ubuntu-20.04
42+
strategy:
43+
fail-fast: false
44+
matrix:
45+
module: [ tools/mapper, ]
46+
47+
steps:
48+
- name: Check out code
49+
uses: actions/checkout@v3
50+
with:
51+
lfs: 'true'
52+
53+
- uses: actions/setup-python@v5
54+
with:
55+
python-version: '3.9'
56+
57+
- uses: actions/cache@v4
58+
with:
59+
path: ${{ env.pythonLocation }}
60+
key: ${{ env.pythonLocation }}-${{ hashFiles('setup.py') }}-${{ hashFiles('requirements.txt') }}
61+
62+
- name: Install requirements
63+
run: pip install -r ./requirements.txt
64+
65+
- name: Profile load check
66+
run: |
67+
tar -xzvf tools/mapper/profile.tar.gz -C tools/mapper/
68+
python3 -c "import json; json.load(open('tools/mapper/profile.json'))"
69+
70+
- name: Test traces load check
71+
run: |
72+
tar -xzvf tools/mapper/test_files/trace.tar.gz -C tools/mapper/test_files/
73+
python3 tools/mapper/trace_load_test.py -t tools/mapper/test_files/extremes/
74+
75+
- name: Extreme mapping tests
76+
run: |
77+
python3 mapper.py -t tools/mapper/test_files/extremes -p tools/mapper/profile.json
78+
79+
- name: Run mapper tool on sample trace
80+
run: |
81+
python3 mapper.py -t tools/mapper/test_files/20 -p tools/mapper/profile.json

docs/.gitattributes

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/WD.png

29.5 KB
Loading

docs/WD_dropped_data.png

32.9 KB
Loading

docs/mapper.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The `profile.json` JSON output file is generated by the [`profiler` tool](https:
77
### Usage
88

99
```bash
10-
usage: mapper.py [-h] -t TRACE_DIRECTORYPATH -p PROFILE_FILEPATH [-o OUTPUT_FILEPATH] [-u UNIQUE_ASSIGNMENT]
10+
usage: mapper.py [-h] -t TRACE_DIRECTORYPATH -p PROFILE_FILEPATH
1111

1212
Arguments:
1313
-h, --help show this help message and exit
@@ -60,7 +60,7 @@ The [`sampler`](https://github.com/vhive-serverless/invitro/tree/main/sampler) t
6060

6161
For every function in the trace, the closest function in the [`vSwarm`](https://github.com/vhive-serverless/vSwarm/tree/main/) benchmark suite is set as its proxy (50-percentile memory and 50-percentile duration are considered to find the highest correlation). The 50th percentile is used to ensure that the mapping is not only corresponding to the peak values of the workload, but is also leading to a representative proxy function. Currently the tool utilizes only _Serving Functions_ that are _NOT Pipelined_ as proxy functions.
6262

63-
Currently, vSwarm does not have full coverage of the Azure trace functions. To make sure that we do not have a high error in the mapping of functions, we set a hard threshold of 40% as the maximum absolute error (from the actual trace function duration) that a proxy function can have to be mapped to a trace function. If we do not find an eligible function for the mapping from vSwarm, we fall back to using standard InVitro sample functions in their place for those functions alone.
63+
Currently, vSwarm does not have full coverage of the Azure trace functions. To make sure that we do not have a high error in the mapping of functions, we set a hard threshold of 40% as the maximum absolute error (from the actual trace function duration) that a proxy function can have to be mapped to a trace function. If we do not find an eligible function for the mapping from vSwarm, we fall back to using standard InVitro trace functions in their place for those functions alone.
6464

6565
This mapping requires the profiles of the benchmark functions for it to be used as a proxy. The tool utilizes the `profile.json` JSON output file generated by the [`profiler` tool](https://github.com/vhive-serverless/vSwarm/tree/load-generator/tools/profiler#profiler) to obtain the profile of the benchmark suite functions. The User can configure the path of the JSON file through the `-p` (or `--profile-filepath`) flag (by default, it is `profile.json`, which needs to be unzipped).
6666

@@ -121,15 +121,18 @@ The dropped functions plot is as shown below:
121121

122122
Beyond this, we also display the following error metrics whenever the mapper is run with a trace:
123123

124-
- Average memory error
125-
- Average duration error
126-
- Average absolute memory error
127-
- Average absolute duration error
128-
- Average relative memory error
129-
- Average relative duration error
130-
- Average absolute relative memory error
131-
- Average absolute relative duration error
132-
- Functions in the trace with 0 duration
133-
- Number of mapped functions with higher than 40% duration error (replaced by InVitro sample functions)
124+
125+
| Metric | Value |
126+
| --- | --- |
127+
| Average memory error | -7.638341413565764 MB per invocation |
128+
| Average duration error | 4174.5554028958695 ms per invocation |
129+
| Average absolute memory error | 24.24782794856284 MB per invocation |
130+
| Average absolute duration error | 4414.451828203135 ms per invocation |
131+
| Average relative memory error | -0.8412999109296387 |
132+
| Average relative duration error | 0.004934168605729668 |
133+
| Average absolute relative memory error | 1.0028566557523266 |
134+
| Average absolute relative duration error | 0.20141343497568448 |
135+
| Functions with 0 duration | 1596 |
136+
| Number of mapped functions with higher than 40% duration error (replaced by InVitro trace functions) | 5258 |
134137

135138
---

docs/mapper_cdf.png

24.9 KB
Loading
Lines changed: 28 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
1-
import numpy as np
2-
import scipy.optimize as sp
31
import math
4-
52
from log_config import *
6-
from typing import Tuple
73

84
def get_error(trace_function, proxy_function) -> float:
95
"""
@@ -44,7 +40,7 @@ def get_error(trace_function, proxy_function) -> float:
4440

4541
def get_closest_proxy_function(
4642
trace_functions: dict, proxy_functions: dict
47-
) -> Tuple[dict, int]:
43+
) -> dict:
4844
"""
4945
Obtains the closest proxy function for every trace function
5046
@@ -57,73 +53,32 @@ def get_closest_proxy_function(
5753
- `int`: 0 if no error. -1 if error
5854
"""
5955

60-
try:
61-
proxy_list = []
62-
for function_name in proxy_functions:
63-
proxy_list.append(proxy_functions[function_name])
64-
proxy_functions[function_name]["index"] = len(proxy_list) - 1
65-
66-
for function_name in trace_functions:
67-
min_error = math.inf
68-
min_error_index = -1
69-
for i in range(0, len(proxy_list)):
70-
error = get_error(trace_functions[function_name], proxy_list[i])
71-
if error < min_error:
72-
min_error = error
73-
min_error_index = i
74-
75-
if min_error == math.inf:
76-
log.warning(f"Proxy function for function {function_name} not found")
77-
continue
78-
79-
trace_functions[function_name]["proxy-function"] = proxy_list[
80-
min_error_index
81-
]["name"]
82-
trace_functions[function_name]["proxy-correlation"] = get_error(
83-
trace_functions[function_name], proxy_list[min_error_index]
84-
)
85-
log.debug(
86-
f"Found proxy function for {function_name}: {trace_functions[function_name]['proxy-function']} with correlation: {trace_functions[function_name]['proxy-correlation']}"
87-
)
88-
89-
for function_name in proxy_functions:
90-
del proxy_functions[function_name]["index"]
91-
92-
return trace_functions, 0
93-
94-
except Exception as e:
95-
log.error(f"Finding closest proxy function failed. Error: {e}")
96-
return trace_functions, -1
97-
98-
99-
def get_proxy_function(
100-
trace_functions: dict, proxy_functions: dict
101-
) -> Tuple[dict, int]:
102-
"""
103-
Obtains the closest proxy function for every trace function
104-
105-
Parameters:
106-
- `trace_functions` (dict): Dictionary containing information regarding trace functions
107-
- `proxy_functions` (dict): Dictionary containing information regarding proxy functions
56+
proxy_list = []
57+
for function_name in proxy_functions:
58+
proxy_list.append(proxy_functions[function_name])
59+
proxy_functions[function_name]["index"] = len(proxy_list) - 1
60+
61+
for function_name in trace_functions:
62+
min_error = math.inf
63+
min_error_index = -1
64+
for i in range(0, len(proxy_list)):
65+
error = get_error(trace_functions[function_name], proxy_list[i])
66+
if error < min_error:
67+
min_error = error
68+
min_error_index = i
69+
70+
if min_error == math.inf:
71+
log.warning(f"Proxy function for function {function_name} not found. Using InVitro trace function.")
72+
trace_functions[function_name]["proxy-function"] = "trace-func-go"
73+
continue
74+
75+
trace_functions[function_name]["proxy-function"] = proxy_list[
76+
min_error_index
77+
]["name"]
78+
79+
for function_name in proxy_functions:
80+
del proxy_functions[function_name]["index"]
10881

109-
Returns:
110-
- `dict`: Dictionary containing information regarding trace functions with the associated proxy functions
111-
- `int`: 0 if no error. -1 if error
112-
"""
113-
114-
log.info(
115-
f"Lower the correlation value, the proxy function is a better proxy of the trace function"
116-
)
117-
118-
log.info(
119-
f"Getting closest proxy function for every trace function."
120-
)
121-
trace_functions, err = get_closest_proxy_function(
122-
trace_functions=trace_functions, proxy_functions=proxy_functions
123-
)
124-
125-
if err == -1:
126-
log.critical(f"Mapping between trace function and proxy function not obtained")
127-
return trace_functions, -1
82+
log.info("Proxy functions found for all trace functions.")
12883

129-
return trace_functions, 0
84+
return trace_functions

tools/mapper/mapper.py

Lines changed: 17 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44
import re
55
import argparse
66
import pandas as pd
7-
import matplotlib.pyplot as plt
87
from find_proxy_function import *
98

109
from log_config import *
1110

1211
INVOCATION_COLUMN = 4
12+
VSWARM_MAX_DUR = 27000
1313

1414
def load_trace(trace_directorypath):
1515
duration_info = {}
@@ -27,6 +27,9 @@ def load_trace(trace_directorypath):
2727
f"Durations file {duration_filepath} cannot be read. Error: {e}"
2828
)
2929
return None, -1
30+
else:
31+
log.critical(f"Durations file {duration_filepath} not found")
32+
return None, -1
3033

3134
# Read the memory file
3235
memory_filepath = trace_directorypath + "/memory.csv"
@@ -39,6 +42,9 @@ def load_trace(trace_directorypath):
3942
f"Memory file {memory_filepath} cannot be read. Error: {e}"
4043
)
4144
return None, -1
45+
else:
46+
log.critical(f"Memory file {memory_filepath} not found")
47+
return None, -1
4248

4349
# Rename all columns in the dataframe with a lambda (for example: percentile_Average_1 -> 1-percentile) if x matches a regex
4450
duration_info = duration_info.rename(columns=lambda x: re.sub(r'percentile_(\w+)_(\d+)', r'\2-' + "percentile", x))
@@ -59,52 +65,9 @@ def load_trace(trace_directorypath):
5965

6066
return trace_functions, 0
6167

62-
def generate_plot(trace_directorypath, profile_filepath, output_filepath, invoke=True):
63-
trace_functions, err = load_trace(trace_directorypath)
64-
if err == -1:
65-
log.critical(f"Load Generation failed")
66-
return
67-
elif err == 0:
68-
log.info(f"Trace loaded")
69-
70-
## Check whether the profile file for proxy functions exists or not
71-
if os.path.exists(profile_filepath):
72-
log.info(
73-
f"Profile file for proxy functions {profile_filepath} exists. Accessing information"
74-
)
75-
try:
76-
with open(profile_filepath, "r") as jf:
77-
proxy_functions = json.load(jf)
78-
except Exception as e:
79-
log.critical(
80-
f"Profile file for proxy functions {profile_filepath} cannot be read. Error: {e}"
81-
)
82-
log.critical(f"Load Generation failed")
83-
return
84-
else:
85-
log.critical(f"Profile file for proxy functions {profile_filepath} not found")
86-
log.critical(f"Load Generation failed")
87-
return
88-
89-
if os.path.exists(output_filepath):
90-
log.info(
91-
f"Mapper output file for trace functions {output_filepath} exists. Accessing information"
92-
)
93-
try:
94-
with open(output_filepath, "r") as jf:
95-
mapped_traces = json.load(jf)
96-
except Exception as e:
97-
log.critical(
98-
f"Mapper output file for trace functions {output_filepath} cannot be read. Error: {e}"
99-
)
100-
log.critical(f"Load Generation failed")
101-
return
102-
else:
103-
log.critical(f"Mapper output file for trace functions {output_filepath} not found")
104-
log.critical(f"Load Generation failed")
105-
return
68+
def generate_plot(trace_directorypath, trace_functions, proxy_functions, mapped_trace, invocation_statistics=True):
10669

107-
if invoke:
70+
if invocation_statistics:
10871
invocations = pd.read_csv(trace_directorypath+"/invocations.csv")
10972
inv_df = {}
11073
for i in range(len(invocations)):
@@ -115,12 +78,12 @@ def generate_plot(trace_directorypath, profile_filepath, output_filepath, invoke
11578
dropped_functions, dropped_invocations, total_invocations = 0, 0, 0
11679
for trace in trace_functions:
11780
duration = trace_functions[trace]["duration"]["50-percentile"]
118-
if duration > 27000:
81+
if duration > VSWARM_MAX_DUR:
11982
dropped_functions += 1
12083
dropped_invocations += inv_df[trace]
12184
continue
12285
total_invocations += inv_df[trace]
123-
proxy_name = mapped_traces[trace]["proxy-function"]
86+
proxy_name = mapped_trace[trace]["proxy-function"]
12487
profile_duration = proxy_functions[proxy_name]["duration"]["50-percentile"]
12588
trace_durations.append(duration)
12689
mapped_durations.append(profile_duration)
@@ -130,9 +93,9 @@ def generate_plot(trace_directorypath, profile_filepath, output_filepath, invoke
13093
mapped_durations = []
13194
for trace in trace_functions:
13295
duration = trace_functions[trace]["duration"]["50-percentile"]
133-
if duration > 27000:
96+
if duration > VSWARM_MAX_DUR:
13497
continue
135-
proxy_name = mapped_traces[trace]["proxy-function"]
98+
proxy_name = mapped_trace[trace]["proxy-function"]
13699
profile_duration = proxy_functions[proxy_name]["duration"]["50-percentile"]
137100
trace_durations.append(duration)
138101
mapped_durations.append(profile_duration)
@@ -161,7 +124,7 @@ def main():
161124
output_filepath = trace_directorypath + "/mapper_output.json"
162125
trace_functions, err = load_trace(trace_directorypath)
163126
if err == -1:
164-
log.critical(f"Load Generation failed")
127+
log.critical(f"Trace loading failed")
165128
return
166129
elif err == 0:
167130
log.info(f"Trace loaded")
@@ -186,15 +149,12 @@ def main():
186149
return
187150

188151
# Getting a proxy function for every trace function
189-
trace_functions, err = get_proxy_function(
152+
trace_functions = get_closest_proxy_function(
190153
trace_functions=trace_functions,
191154
proxy_functions=proxy_functions,
192155
)
193-
if err == -1:
194-
log.critical(f"Load Generation failed")
195-
return
196-
elif err == 0:
197-
log.info(f"Proxy functions obtained")
156+
157+
log.info(f"Proxy functions obtained")
198158

199159
# Writing the proxy functions to a file
200160

@@ -240,12 +200,10 @@ def main():
240200
trace_dur = trace_functions[function]["duration"]["50-percentile"]
241201
proxy_dur = proxy_functions[mapper_output[function]["proxy-function"]]["duration"]["50-percentile"]
242202
proxy_mem = proxy_functions[mapper_output[function]["proxy-function"]]["memory"]["50-percentile"]
243-
log.warning(f"Memory error for function {function} is {abs(trace_mem - proxy_mem)} MB per invocation")
244203
mem_error += trace_mem - proxy_mem
245204
rel_mem_error += (trace_mem - proxy_mem)/trace_mem
246205
abs_mem_error += abs(trace_mem - proxy_mem)
247206
abs_rel_mem_error += abs((trace_mem - proxy_mem)/trace_mem)
248-
log.warning(f"Duration error for function {function} is {abs(trace_dur - proxy_dur)}ms per invocation")
249207
dur_error += trace_dur - proxy_dur
250208
abs_dur_error += abs(trace_dur - proxy_dur)
251209
if trace_dur == 0:
5.38 KB
Binary file not shown.

0 commit comments

Comments
 (0)