This repository provides additional content for the paper "Bayesian Multi-Level Performance Models for Multi-Factor Variability of Configurable Software Systems"
PDF: will be linked later
Tuning a software system’s configuration is essential to meet performance requirements. However, performance is not only influenced by configuration options, but also by external factors such as the workload. Hence, tuning requires understanding how a specific setting of external factors (e.g., a specific workload) in combination with the system configuration influences performance. However, current performance modeling approaches usually do not incorporate external factors, for good reasons: Training a separate model per setting is costly and is unlikely to generalize, whereas a single model trained on multiple settings fails to capture variations that are specific to a certain setting.
To address this shortcoming, we propose HyPerf, a Bayesian multi-level performance modeling approach that systematically distinguishes between setting-invariant and setting-variant influences, that is, influences that remain consistent across settings versus those that exhibit substantial variation. For this purpose, HyPerf employs a hierarchical structure: The upper level captures general performance trends across multiple settings (e.g., across different workloads), while the lower level refines these estimates with setting-specific deviations (e.g., workload-specific performance variations).
With HyPerf, we aim at balancing accuracy and efficiency, achieving robust performance predictions with significantly fewer training samples. Unlike the state of the art, HyPerf is able to identify a minimal set of settings that captures essential performance variations, so that developers can approximate whether all setting-variant influences have been accounted for.
Empirical evaluations on ten real-world software systems across up to 35 workloads demonstrates that HyPerf matches or outperforms state-of-the-art approaches while requiring fewer measurements. Notably, HyPerf is indeed capable of interpretable performance reasoning and can identify minimal workload subsets that capture essential performance variations.
You can view the pMAPE values in the respective sub-folder. We also provide detailed training results for the TuxKConfig dataset in the sub-folder for RQ1.3 In the anonymized repository, please use the file tree on the left to navigate, because links to folders do not work.
Extending Figure 3 in the paper, you can compare all general influences against their workload-specific influences by navigating the respective sub-folder.
In the anonymized repository, please use the file tree on the left to navigate, because links to folders do not work.
Extending Figure 5 in the paper, you can view all representation matrices for all options by navigating the respective "representation-matrices" subfolder for each software system. In the anonymized repository, please use the file tree on the left to navigate, because links to folders do not work.
Extending Figure 6 in the paper, you can view all representative set building protocols and plots by navigating the root subfolders for each software system. In the anonymized repository, please use the file tree on the left to navigate, because links to folders do not work.
Example for Z3:
For running the experiments with any of the ways explained below, there are different parameters to be adjusted:
--jobsdefines how many models are trained in parallel. Increasing it reduces the total run time without altering the results. Each job employs 3 MCMC chains, resulting in 3 required threads per job. E.g., for 6 available threads, choose--jobs 2.--storeshould only be used if insights into posterior distributions are needed, e.g., when replicating the paper's plots through the provided dashboards.--repsdefines the number of repetitions. While the paper used 30 repetitions, we recommend reducing to 1 to check if everything works.--training-set-sizedisables the sweep over different training set sizes and, instead, only uses the given size. Passing 0.5 will train on 0.5N training data for all software systems listed in the main.py.- To replicate the RQs in the paper, use the commands for running the docker container as outlined below.
To run the full experiment via Docker, follow these steps:
-
Install Docker:
- Refer to the Docker Documentation for installation instructions.
-
Clone the Repository:
- Navigate to the directory where you want to clone the repository and run:
git clone https://github.com/anonym458551495/multilvl-models-multi-factor-variab cd path-of-repo/multilvl-models-multi-factor-variab/experiment-code -
Build the Docker Image:
- Open a terminal.
- Change your directory to
path-of-repo/multilvl-models-multi-factor-variab/experiment-code. - Run the following command:
docker build ./ -t hyperf/repl
-
Run RQ1 with the Docker Container:
- After the build is complete, run your Docker container with:
docker run -p 8083:8083 --name hyperf-rq1 hyperf/repl rq1 --reps 5 --jobs 5 docker run -it -p 8083:8083 --name hyperf-rq1 hyperf/repl rq1 --reps 5 --jobs 5
- Adjust the number of jobs to your CPU; five repetitions should suffice to see robust trends, but do choose 30 to replicate the paper's experiment
- If you want to detach from the container during the experiment or after, without losing experiment data and the running dashboard, press
Ctr + pCtr + q- re-attach using your container name:
docker attach hyperf-rq1-v2
- re-attach using your container name:
- When the job is finished, explore the Streamlit dashboard at http://localhost:8083. You can change the port by
replacing the port before the colon, i.e.,
OUTERPORT:8083. - To copy the results outside the docker use:
docker cp hyperf-rq1:/app/wluncert/results /local/path docker rm temp-container
- After the build is complete, run your Docker container with:
-
Run RQ2 and RQ3 with the Docker Container:
-
After the build is complete, run your Docker container with:
docker run -it -p 8084:8084 --name hyperf-rq2-and-3 hyperf/repl rq23 --jobs 5
- Adjust the number of jobs to your hardware; calling the rq23 command automatically only runs 1 repetition
-
Explore the Streamlit dashboard at http://localhost:8084.
-
To copy the results outside the docker use:
docker cp hyperf-rq2-and-3:/app/wluncert/results /local/path docker rm temp-container
-
-
Run custom experiments with the Docker Container:
- To set own parameters, use the custom-experiment command or start a bash in the new container:
or
docker run -it -p 8083:8083 -p 8084:8084 --name hyperf-custom-experiment hyperf/repl custom-experiment --jobs 5 --training-set-size 5
docker run -it -p 8083:8083 -p 8084:8084 --name hyperf-custom-session --entrypoint bash hyperf/repl
- To set own parameters, use the custom-experiment command or start a bash in the new container:
To change the software systems, the easiest way is to bring the data from your new software system into the same format as one of the existing software systems. You can find the data in [Training-Data](/experiment-code/wluncert/training-data).
After that, you need to modify main.py in the wluncert directory:
- Add your software system's name to the
selected-datalist (line 216). - Add your software system to the
get_datasetsfunction (line 511) using theDataAdapterfrom the software system with the same data format.



