Skip to content

Commit a8f3342

Browse files
committed
Added the mapper tool to map trace to vSwarm proxies.
Added documentation for the mapper tool. Signed-off-by: KarthikL1729 <karthiklaksh1729@gmail.com>
1 parent b6aad78 commit a8f3342

File tree

7 files changed

+544
-1
lines changed

7 files changed

+544
-1
lines changed

.github/configs/wordlist.txt

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -771,4 +771,19 @@ autoscaler
771771
FailAt
772772
FailComponent
773773
FailNode
774-
FailureEnabled
774+
FailureEnabled
775+
DIRECTORYPATH
776+
FILEPATH
777+
directorypath
778+
filepath
779+
HashApp
780+
HashFunction
781+
HashOwner
782+
AverageAllocatedMb
783+
SampleCount
784+
Jonker
785+
Pipelined
786+
SciPy
787+
Volgenant
788+
injective
789+
py

docs/mapper.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Mapper
2+
3+
The mapper tool can be used to map the functions in a given trace directory (with memory and duration traces) to the proxy functions in the [`vSwarm`](https://github.com/vhive-serverless/vSwarm/tree/main/) benchmark suite. The benchmarks present in the vSwarm benchmark suite have been profiled and their memory utilization and duration traces have been collected and stored in the `profile.json` file. Each function in the trace is mapped to a function in the benchmark suite as its closest proxy (based on memory and duration correlation).
4+
5+
The `profile.json` JSON output file is generated by the [`profiler` tool](https://github.com/vhive-serverless/vSwarm/tree/load-generator/tools/profiler#profiler) to obtain the profile of the benchmark suite functions.
6+
7+
### Usage
8+
9+
```bash
10+
usage: mapper.py [-h] -t TRACE_DIRECTORYPATH -p PROFILE_FILEPATH [-o OUTPUT_FILEPATH] [-u UNIQUE_ASSIGNMENT]
11+
12+
Arguments:
13+
-h, --help show this help message and exit
14+
-t TRACE_DIRECTORYPATH, --trace-directorypath TRACE_DIRECTORYPATH
15+
Path to the directory containing the trace files (required)
16+
-p PROFILE_FILEPATH, --profile-filepath PROFILE_FILEPATH
17+
Path to the profile file containing the proxy functions
18+
-u UNIQUE_ASSIGNMENT, --unique-assignment UNIQUE_ASSIGNMENT
19+
Whether to assign unique proxy functions to each trace function
20+
```
21+
The tool reads the trace information(memory and duration details) from the `trace/` directory (can be configured using `-t` or `--trace-directorypath` flags). The `trace/` directory must contain the `memory.csv` and `durations.csv` files containing the respective trace information of the format mentioned in [*Azure Functions Dataset 2019*](https://github.com/Azure/AzurePublicDataset/blob/master/AzureFunctionsDataset2019.md)
22+
23+
#### Function Execution Duration `durations.csv` Schema
24+
25+
|Field|Description |
26+
|--|--|
27+
| HashOwner | unique id of the application owner |
28+
| HashApp | unique id for application name |
29+
| HashFunction | unique id for the function name within the app |
30+
|Average | Average execution time (ms) across all invocations of the 24-period|
31+
|Count | Number of executions used in computing the average|
32+
|Minimum | Minimum execution time|
33+
|Maximum | Maximum execution time|
34+
|percentile_Average_0| Weighted 0th-percentile of the execution time *average*|
35+
|percentile_Average_1| Weighted 1st-percentile of the execution time *average*|
36+
|percentile_Average_25 | Weighted 25th-percentile of the execution time *average*|
37+
|percentile_Average_50 | Weighted 50th-percentile of the execution time *average*|
38+
|percentile_Average_75 | Weighted 75th-percentile of the execution time *average*|
39+
|percentile_Average_99 | Weighted 99th-percentile of the execution time *average*|
40+
|percentile_Average_100 | Weighted 100th-percentile of the execution time *average*|
41+
Execution time is in milliseconds.
42+
43+
#### Function Memory Usage `memory.csv` Schema
44+
45+
|Field|Description |
46+
|--|--|
47+
| HashOwner | unique id of the application owner |
48+
| HashApp | unique id for application name |
49+
| HashFunction | unique id for the function name within the app |
50+
|SampleCount | Number of samples used for computing the average |
51+
|AverageAllocatedMb | Average allocated memory across all SampleCount measurements|
52+
|AverageAllocatedMb_pct1 | 1st percentile of the average allocated memory|
53+
|AverageAllocatedMb_pct5 | 5th percentile of the average allocated memory|
54+
|AverageAllocatedMb_pct25 | 25th percentile of the average allocated memory|
55+
|AverageAllocatedMb_pct50 | 50th percentile of the average allocated memory|
56+
|AverageAllocatedMb_pct75 | 75th percentile of the average allocated memory|
57+
|AverageAllocatedMb_pct95 | 95th percentile of the average allocated memory|
58+
|AverageAllocatedMb_pct99 | 99th percentile of the average allocated memory|
59+
|AverageAllocatedMb_pct100 | 100th percentile of the average allocated memory|
60+
61+
The [`sampler`](https://github.com/vhive-serverless/invitro/tree/main/sampler) tool in InVitro can be used to generate the sampled traces from the original Azure traces.
62+
63+
For every function in the trace, the closest function in the [`vSwarm`](https://github.com/vhive-serverless/vSwarm/tree/main/) benchmark suite is set as its proxy (75-percentile memory and 75-percentile duration are considered to find the highest correlation). If the `-u` (or `--unique-assignment`) flag is set to true, the tool tries to find a one-to-one (injective) mapping between trace functions and proxy functions by modelling it as a *linear sum assignment* problem which is solved by the SciPy implementation of the *Jonker-Volgenant algorithm*. The 75th percentile is used to ensure that the mapping is not only corresponding to the peak values of the workload, but is also leading to a representative proxy function. If the number of trace functions is greater than the number of proxy functions, or if the mapping is not achieved, the injective constraint is removed, and the closest proxy function is obtained. Currently the tool utilizes only _Serving Functions_ that are _NOT Pipelined_ as proxy functions.
64+
65+
This mapping requires the profiles of the benchmark functions for it to be used as a proxy. The tool utilizes the `profile.json` JSON output file generated by the [`profiler` tool](https://github.com/vhive-serverless/vSwarm/tree/load-generator/tools/profiler#profiler) to obtain the profile of the benchmark suite functions. The User can configure the path of the JSON file through the `-p` (or `--profile-filepath`) flag (by default, it is `profile.json`, which needs to be unzipped).
66+
67+
An example of a generated output file is as follows:
68+
69+
```json
70+
{
71+
"c13acdc7567b225971cef2416a3a2b03c8a4d8d154df48afe75834e2f5c59ddf": {
72+
"proxy-function": "video-processing-python-10"
73+
},
74+
"a2faad786b3c813b12ce57d349d5e62f6d0f22ceecfa86cd72a962853383b600": {
75+
"proxy-function": "image-rotate-go-11"
76+
},
77+
"7dc5aeabc131669912e8c793c8925cc9928321f45f13a4af031592b4611630d7": {
78+
"proxy-function": "video-processing-python-70"
79+
},
80+
"ae8a1640fa932024f59b38a0b001808b5c64612bd60c6f3eb80ba9461ba2d091": {
81+
"proxy-function": "video-processing-python-20"
82+
}
83+
}
84+
```
85+
86+
The mapper output file will be stored in the trace directory with the name `mapper_output.json` by default. The output file contains the mapping of the trace functions to the proxy functions in the vSwarm benchmark suite.
87+
88+
---

tools/mapper/.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
profile.tar.gz filter=lfs diff=lfs merge=lfs -text
Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
import numpy as np
2+
import scipy.optimize as sp
3+
import math
4+
5+
from collections import OrderedDict
6+
7+
from log_config import *
8+
from typing import Tuple
9+
10+
def get_error(trace_function, proxy_function) -> float:
11+
"""
12+
Returns a float value on how close the trace function is to the proxy function. Lower the value, better the correlation.
13+
Euclidean distance between normalized memory and duration is considered.
14+
15+
Parameters:
16+
- `trace_function` (dict): Dictionary containing information regarding trace function
17+
- `proxy_function` (dict): Dictionary containing information regarding proxy function
18+
19+
Returns:
20+
- `float`: closeness value
21+
"""
22+
23+
try:
24+
trace_memory = trace_function["memory"]["75-percentile"]
25+
proxy_memory = proxy_function["memory"]["75-percentile"]
26+
trace_duration = trace_function["duration"]["75-percentile"]
27+
proxy_duration = proxy_function["duration"]["75-percentile"]
28+
except KeyError as e:
29+
log.warning(f"Correlation cannot be found. Error: {e}")
30+
return math.inf
31+
32+
# NOTE: Better Error mechanisms can be considered to improve the correlation
33+
# Currently only the 75%tile memory and duration are considered.
34+
# Euclidean distance between normalized memory and duration is considered
35+
try:
36+
if trace_memory == 0: trace_memory += 0.01
37+
if trace_duration == 0: trace_duration += 0.01
38+
diff_memory = (math.log(trace_memory) - math.log(proxy_memory))
39+
diff_duration = (math.log(trace_duration) - math.log(proxy_duration))
40+
error = math.sqrt((diff_memory) ** 2 + (diff_duration) ** 2)
41+
return error
42+
except ValueError as e:
43+
log.warning(f"Correlation cannot be found. Error: {e}")
44+
return math.inf
45+
46+
47+
def get_proxy_function_using_linear_sum_assignment(
48+
trace_functions: dict, proxy_functions: dict
49+
) -> Tuple[dict, int]:
50+
"""
51+
Obtains the one-to-one mapped proxy function for every trace function
52+
53+
Parameters:
54+
- `trace_functions` (dict): Dictionary containing information regarding trace functions
55+
- `proxy_functions` (dict): Dictionary containing information regarding proxy functions
56+
57+
Returns:
58+
- `dict`: Dictionary containing information regarding trace functions with the associated proxy functions
59+
- `int`: 0 if no error. -1 if error
60+
"""
61+
62+
try:
63+
64+
trace_functions = OrderedDict(trace_functions)
65+
proxy_functions = OrderedDict(proxy_functions)
66+
67+
trace_list = []
68+
for tf in trace_functions:
69+
trace_list.append(trace_functions[tf])
70+
trace_functions[tf]["index"] = len(trace_list) - 1
71+
72+
proxy_list = []
73+
for pf in proxy_functions:
74+
proxy_list.append(proxy_functions[pf])
75+
proxy_functions[pf]["index"] = len(proxy_list) - 1
76+
77+
# Creating error matrix
78+
m, n = len(trace_functions.keys()), len(proxy_functions.keys())
79+
error_matrix = np.empty((m, n))
80+
81+
# This utilized Jonker-Volgenant algorithm for Linear Sum assignment - scipy package
82+
# to calculate the best possible assignment for the trace functions
83+
# Time complexity : O(n^3) where n is the largest of number of rows/columns
84+
for i in range(m):
85+
for j in range(n):
86+
error_matrix[i, j] = get_error(trace_list[i], proxy_list[j])
87+
88+
# Do the linear sum assignment problem
89+
row_indices, col_indices = sp.linear_sum_assignment(error_matrix)
90+
assignments = list(zip(row_indices, col_indices))
91+
92+
# Go through the assignment solution
93+
for assignment in assignments:
94+
row_index = assignment[0]
95+
col_index = assignment[1]
96+
trace = ""
97+
proxy = ""
98+
for tf in trace_functions:
99+
if row_index == trace_functions[tf]["index"]:
100+
trace = tf
101+
break
102+
for pf in proxy_functions:
103+
if col_index == proxy_functions[pf]["index"]:
104+
proxy = pf
105+
break
106+
trace_functions[trace]["proxy-function"] = proxy
107+
trace_functions[trace]["proxy-correlation"] = get_error(
108+
trace_functions[trace], proxy_functions[proxy]
109+
)
110+
log.debug(
111+
f"Found proxy function for {trace}: {trace_functions[trace]['proxy-function']} with correlation: {trace_functions[trace]['proxy-correlation']}"
112+
)
113+
114+
# Go through the trace functions to ensure proxy function exists. If not, then report
115+
for tf in trace_functions:
116+
if "proxy-function" not in trace_functions[tf]:
117+
log.warning(f"Mapping for function {tf} not found")
118+
elif trace_functions[tf]["proxy-function"] == "":
119+
log.warning(f"Mapping for function {tf} not found")
120+
121+
# Deleting unnecessary stuffs
122+
for tf in trace_functions:
123+
del trace_functions[tf]["index"]
124+
for pf in proxy_functions:
125+
del proxy_functions[pf]["index"]
126+
127+
return trace_functions, 0
128+
129+
except Exception as e:
130+
log.error(f"Mapping through linear sum assignment failed. Error: {e}")
131+
return trace_functions, -1
132+
133+
134+
def get_closest_proxy_function(
135+
trace_functions: dict, proxy_functions: dict
136+
) -> Tuple[dict, int]:
137+
"""
138+
Obtains the closest proxy function for every trace function
139+
140+
Parameters:
141+
- `trace_functions` (dict): Dictionary containing information regarding trace functions
142+
- `proxy_functions` (dict): Dictionary containing information regarding proxy functions
143+
144+
Returns:
145+
- `dict`: Dictionary containing information regarding trace functions with the associated proxy functions
146+
- `int`: 0 if no error. -1 if error
147+
"""
148+
149+
try:
150+
proxy_list = []
151+
for function_name in proxy_functions:
152+
proxy_list.append(proxy_functions[function_name])
153+
proxy_functions[function_name]["index"] = len(proxy_list) - 1
154+
155+
for function_name in trace_functions:
156+
min_error = math.inf
157+
min_error_index = -1
158+
for i in range(0, len(proxy_list)):
159+
error = get_error(trace_functions[function_name], proxy_list[i])
160+
if error < min_error:
161+
min_error = error
162+
min_error_index = i
163+
164+
if min_error == math.inf:
165+
log.warning(f"Proxy function for function {function_name} not found")
166+
continue
167+
168+
trace_functions[function_name]["proxy-function"] = proxy_list[
169+
min_error_index
170+
]["name"]
171+
trace_functions[function_name]["proxy-correlation"] = get_error(
172+
trace_functions[function_name], proxy_list[min_error_index]
173+
)
174+
log.debug(
175+
f"Found proxy function for {function_name}: {trace_functions[function_name]['proxy-function']} with correlation: {trace_functions[function_name]['proxy-correlation']}"
176+
)
177+
178+
for function_name in proxy_functions:
179+
del proxy_functions[function_name]["index"]
180+
181+
return trace_functions, 0
182+
183+
except Exception as e:
184+
log.error(f"Finding closest proxy function failed. Error: {e}")
185+
return trace_functions, -1
186+
187+
188+
def get_proxy_function(
189+
trace_functions: dict, proxy_functions: dict, unique_assignment: bool
190+
) -> Tuple[dict, int]:
191+
"""
192+
Obtains the closest proxy function for every trace function
193+
194+
Parameters:
195+
- `trace_functions` (dict): Dictionary containing information regarding trace functions
196+
- `proxy_functions` (dict): Dictionary containing information regarding proxy functions
197+
- `unique_assignment` (bool): If `True`, then trace-proxy function mapping is one-to-one, provided #(proxy functions) > #(trace functions)
198+
199+
Returns:
200+
- `dict`: Dictionary containing information regarding trace functions with the associated proxy functions
201+
- `int`: 0 if no error. -1 if error
202+
"""
203+
204+
trace_functions = OrderedDict(trace_functions)
205+
proxy_functions = OrderedDict(proxy_functions)
206+
207+
log.info(
208+
f"Lower the correlation value, the proxy function is a better proxy of the trace function"
209+
)
210+
211+
if (unique_assignment) and (len(trace_functions) <= len(proxy_functions)):
212+
log.info(
213+
f"Getting One-To-One mapping between trace function and proxy function using Linear-Sum-Assignment"
214+
)
215+
trace_functions, err = get_proxy_function_using_linear_sum_assignment(
216+
trace_functions=trace_functions, proxy_functions=proxy_functions
217+
)
218+
if err == -1:
219+
log.error(
220+
f"One-To-One mapping between trace function and proxy function not obtained"
221+
)
222+
log.info(
223+
f"Getting closest proxy function for every trace function. Note: Mapping may not be unique"
224+
)
225+
trace_functions, err = get_closest_proxy_function(
226+
trace_functions=trace_functions, proxy_functions=proxy_functions
227+
)
228+
229+
elif (unique_assignment) and (len(trace_functions) > len(proxy_functions)):
230+
log.warning(
231+
f"One-To-One mapping between trace function and proxy function not possible since number of trace functions is greater than available proxy functions"
232+
)
233+
log.info(
234+
f"Getting closest proxy function for every trace function. Note: Mapping may not be unique"
235+
)
236+
trace_functions, err = get_closest_proxy_function(
237+
trace_functions=trace_functions, proxy_functions=proxy_functions
238+
)
239+
240+
else:
241+
log.info(
242+
f"Getting closest proxy function for every trace function. Note: Mapping may not be unique"
243+
)
244+
trace_functions, err = get_closest_proxy_function(
245+
trace_functions=trace_functions, proxy_functions=proxy_functions
246+
)
247+
248+
if err == -1:
249+
log.critical(f"Mapping between trace function and proxy function not obtained")
250+
return trace_functions, -1
251+
252+
return trace_functions, 0

0 commit comments

Comments
 (0)