|
1 | | -# VEVOS: Ground Truth Extraction |
2 | | -VEVOS is a tool suite for the simulation of the evolution of clone-and-own projects and consists of two main components: The ground truth extraction, called VEVOS/Extraction and the variant simulation called VEVOS/Simulation. |
| 1 | +# VEVOS: Ground Truth Extraction v2.0.0 |
3 | 2 |
|
4 | | -This repository contains VEVOS/Extraction. |
5 | | -Please refer to our paper _Simulating the Evolution of Clone-and-Own Projects with VEVOS_ published at the International Conference on Evaluation and Assessment in Software Engineering (EASE) 2022 ([doi](https://doi.org/10.1145/3530019.3534084)) for more information. |
6 | | -VEVOS/Extraction is a Java project for extracting feature mappings, presence conditions, and feature models for each revision (within a specified range of the commit-history) from an input software product line. |
| 3 | +VEVOS is a tool suite for the simulation of the evolution of clone-and-own projects and consists of two main components: |
| 4 | +The ground truth extraction, called VEVOS/Extraction and the variant simulation called VEVOS/Simulation. |
| 5 | + |
| 6 | +This repository contains VEVOS/Extraction. |
| 7 | +Please refer to our paper _Simulating the Evolution of Clone-and-Own Projects with VEVOS_ published at the International |
| 8 | +Conference on Evaluation and Assessment in Software Engineering (EASE) |
| 9 | +2022 ([doi](https://doi.org/10.1145/3530019.3534084)) for more information. |
| 10 | +VEVOS/Extraction is a Java project for extracting feature mappings, presence conditions, and feature models for each |
| 11 | +commit from an input software product line. |
| 12 | + |
| 13 | +## Version 2.x.x Update |
| 14 | + |
| 15 | +### Improvements |
| 16 | + |
| 17 | +Version 2.0.0 of VEVOS Extraction presents a major improvement over version 1.0.0 in terms of extractable product lines, |
| 18 | +commit coverage, and extraction speed. |
| 19 | +VEVOS is now based on [DiffDetective](https://github.com/VariantSync/DiffDetective) a library for analyses of edits to |
| 20 | +preprocessor-based product lines. |
| 21 | +Due to this major change, VEVOS can now extract a ground truth for any C preprocessor-based software product line and is |
| 22 | +no longer bound to the availability of a special adaptor. |
| 23 | +For this reason, VEVOS in its default configuration extracts a ground truth for the 43 product lines listed in |
| 24 | +the [without_linux](docker-resources/without_linux.md) dataset file. |
| 25 | + |
| 26 | +### Shortcomings |
| 27 | + |
| 28 | +However, there is also a drawback of the improved extraction. VEVOS 2.0.0 is not capable of extracting a feature model |
| 29 | +or the presence conditions of entire source code files that are defined by additional build files. If these are still |
| 30 | +required, VEVOS 1.x.x has to be used. |
| 31 | + |
| 32 | +### Extraction Modes |
| 33 | + |
| 34 | +There are two basic extraction modes: `fast` and `full`. |
| 35 | + |
| 36 | +#### Fast Extraction |
| 37 | + |
| 38 | +The fast ground truth extraction only extracts the ground truths of changed files for each commit. This extraction is |
| 39 | +very useful for studies that are only interested in the evolution of a software family. |
| 40 | + |
| 41 | +#### Full Extraction |
| 42 | + |
| 43 | +The full ground truth extraction extracts the ground truth for all code files of all commits in a product line. Due to |
| 44 | +the effort of extracting and saving a ground truth for all files of each commit, this extraction may require a very long |
| 45 | +time and large amounts of free disk space. |
| 46 | + |
| 47 | +Essentially, the full ground truth extraction first performs a fast ground truth extraction and then incrementally |
| 48 | +combines the ground truths of all commits. |
7 | 49 |
|
8 | 50 | ## Quick Start using Docker |
9 | | -In the following, we provide instructions on how to quickly extract the ground truth of Linux or Busybox with the provided Docker setup. |
| 51 | + |
| 52 | +In the following, we provide instructions on how to quickly extract a ground truth for any preprocessor-based product |
| 53 | +line using Docker. |
10 | 54 |
|
11 | 55 | ### Requirements |
| 56 | + |
12 | 57 | The only requirement is Docker. We provide batch and bash scripts that execute the necessary Docker setup and execution. |
13 | | -We tested the Docker setup under Windows and Linux. |
| 58 | +We tested the Docker setup under Windows and Linux. |
14 | 59 | We have not tested the setup on Mac, but you should be able to use the instructions for Linux. |
15 | 60 |
|
16 | 61 | ### Preparation |
| 62 | + |
17 | 63 | #### Docker |
| 64 | + |
18 | 65 | Docker must be installed on your system, and the Docker daemon must be running. |
19 | 66 | For installation, follow the instructions given in the installation guide for your OS which you can find |
20 | 67 | [here](https://docs.docker.com/get-docker/). |
21 | 68 | Under Linux, you should follow the optional |
22 | 69 | [post-installation instructions](https://docs.docker.com/engine/install/linux-postinstall/). |
23 | 70 |
|
24 | 71 | #### Repository |
| 72 | + |
25 | 73 | Clone the repository to a location of your choice |
| 74 | + |
26 | 75 | ``` |
27 | 76 | git clone https://github.com/VariantSync/VEVOS_Extraction.git |
28 | 77 | ``` |
| 78 | + |
29 | 79 | Then, navigate to the repository's root directory in a terminal of your choice. |
30 | 80 |
|
| 81 | +### (Optional) Configure the Extraction |
| 82 | + |
| 83 | +You can customize the extraction by changing one of the three integrated configurations for Docker. |
| 84 | +Each configuration has two files associated with it: a properties file that configures VEVOS itself, and a dataset file |
| 85 | +that specifies the product lines for extraction. |
| 86 | + |
| 87 | +#### Config 1: without_linux |
| 88 | + |
| 89 | +[without_linux](docker-resources/without_linux.properties) is the default configuration for extracting a ground truth |
| 90 | +for 43 product lines specified in the corresponding [dataset file](docker-resources/without_linux.md). |
| 91 | + |
| 92 | +#### Config 2: verification |
| 93 | + |
| 94 | +[verification](docker-resources/verification.properties) is the default configuration for extracting a ground truth for |
| 95 | +BusyBox [dataset file](docker-resources/verification.md). This configuration can be used for verification purposes. |
| 96 | + |
| 97 | +#### Config 3: custom |
| 98 | + |
| 99 | +[custom](docker-resources/custom.properties) is the default configuration for extracting a ground truth for any other |
| 100 | +preprocessor-based product line. The desired product line should be specified |
| 101 | +in [dataset file](docker-resources/custom.md). |
| 102 | + |
| 103 | +> Note that configurations are part of the Docker image. Any configuration changes only takes effect after rebuilding |
| 104 | +> the image as described below. |
| 105 | +
|
31 | 106 | ### Build the Docker Image |
32 | | -Before the extraction can be executed, we have to build the Docker image. This can be done by executing the corresponding build script in a terminal. |
| 107 | + |
| 108 | +Before the extraction can be executed, we have to build the Docker image. This can be done by executing the |
| 109 | +corresponding build script in a terminal. |
33 | 110 |
|
34 | 111 | - Linux terminal: `./build-docker-image.sh` |
35 | 112 | - Windows CMD: `build-docker-image.bat` |
36 | 113 |
|
37 | | -This process may roughly take half an hour. |
| 114 | +This process may take a couple of minutes. |
38 | 115 |
|
39 | 116 | ### Start the Ground Truth Extraction in a Docker Container |
40 | | -We provide bash and batch scripts that start the ground truth extraction and copy all data to |
41 | | -_Extraction/extraction-results_ once the extraction is complete, or has been stopped. |
| 117 | + |
| 118 | +We provide bash and batch scripts that start the ground truth extraction. The extracted ground truth is written to |
| 119 | +the [ground-truth](ground-truth) directory. |
42 | 120 | Start the extraction by executing the `start-extraction` script (see examples further below). |
43 | | -The basic syntax is `start-extraction.(sh|bat)`: |
| 121 | +The basic syntax is `start-extraction.(sh|bat) [(verification|custom)] (fast|full)`: |
| 122 | + |
| 123 | +#### Example 1: Start a fast ground truth extraction for `without_linux` |
| 124 | + |
| 125 | +- Windows CMD: |
| 126 | + - `start-extraction.bat fast` |
| 127 | +- Linux terminal: |
| 128 | + - `./start-extraction.sh fast` |
| 129 | + |
| 130 | +#### Example 2: Start a full ground truth extraction for `without_linux` |
44 | 131 |
|
45 | 132 | - Windows CMD: |
46 | | - - `start-extraction.bat` |
| 133 | + - `start-extraction.bat full` |
47 | 134 | - Linux terminal: |
48 | | - - `./start-extraction.sh` |
| 135 | + - `./start-extraction.sh full` |
| 136 | + |
| 137 | +#### Example 3: Start a fast ground truth extraction for `verification` |
| 138 | + |
| 139 | +- Windows CMD: |
| 140 | + - `start-extraction.bat verification fast` |
| 141 | +- Linux terminal: |
| 142 | + - `./start-extraction.sh verification fast` |
| 143 | + |
| 144 | +#### Example 4: Start a full ground truth extraction for `custom` |
| 145 | + |
| 146 | +- Windows CMD: |
| 147 | + - `start-extraction.bat custom full` |
| 148 | +- Linux terminal: |
| 149 | + - `./start-extraction.sh custom full` |
49 | 150 |
|
50 | 151 | #### Runtime |
51 | | -TODO |
| 152 | + |
| 153 | +It is difficult to estimate the runtime as it largely depends on the product line's history size and complexity of |
| 154 | +annotations. To provide some examples: |
| 155 | + |
| 156 | +- A fast ground truth extraction of BusyBox requires 2-15 minutes depending on the machine |
| 157 | +- A full ground truth extraction of BusyBox requires 1-3 hours depending on the machine |
| 158 | +- A fast ground truth extraction of the 43 product lines specified in [without_linux](docker-resources/without_linux.md) |
| 159 | + requires 1-7 days depending on the machine |
52 | 160 |
|
53 | 161 | ### Stopping the Ground Truth Extraction |
54 | | -You can stop the Docker container in which the ground truth extraction is running at any time. In this case, all |
55 | | -collected data will be copied to _Extraction/extraction-results/_ as if the extraction finished successfully. |
| 162 | + |
| 163 | +You can stop the Docker container in which the ground truth extraction is running at any time. |
56 | 164 |
|
57 | 165 | - Windows CMD: |
58 | | - - `stop-extraction.bat` |
| 166 | + - `stop-extraction.bat` |
59 | 167 | - Linux terminal: |
60 | | - - `./stop-extraction.sh` |
| 168 | + - `./stop-extraction.sh` |
61 | 169 |
|
62 | 170 | ### Custom Configuration |
63 | | -You can find the properties files used by Docker under Extraction/docker-resources. By changing the properties, you can adjust the ground truth extraction (e.g., change the log level, |
64 | | -number of threads, etc.). For your convenience, we set all properties to default values. __Note that you have to rebuild the Docker image in order for the changes to take effect__. |
65 | 171 |
|
66 | | -### Clean-Up |
67 | | -You can clean up all created images, container, and volumes via `docker system prune -a`. __DISCLAIMER: This will remove ALL docker objects, even the ones not related to ground truth extraction__. If you have other images, containers, or volumes that you do not want to loose, you can run the docker commands that refer to the objects related to the ground truth extraction. |
| 172 | +You can find the properties files used by Docker under Extraction/docker-resources. By changing the properties, you can |
| 173 | +adjust the ground truth extraction (e.g., change the log level, |
| 174 | +number of threads, etc.). For your convenience, we set all properties to default values. __Note that you have to rebuild |
| 175 | +the Docker image in order for the changes to take effect__. |
| 176 | + |
| 177 | +### Clean-Up |
| 178 | + |
| 179 | +You can clean up all created images, container, and volumes via `docker system prune -a`. __DISCLAIMER: This will remove |
| 180 | +ALL docker objects, even the ones not related to ground truth extraction__. If you have other images, containers, or |
| 181 | +volumes that you do not want to loose, you can run the docker commands that refer to the objects related to the ground |
| 182 | +truth extraction. |
| 183 | + |
68 | 184 | - Image: `docker rmi extraction` |
69 | | -- Container: |
70 | | - - `docker container rm extraction` |
71 | | - - `docker container rm extraction` |
| 185 | +- Container: |
| 186 | + - `docker container rm extraction` |
| 187 | + - `docker container rm extraction` |
72 | 188 | - Volume: |
73 | | - - `docker volume rm extraction` |
74 | | - - `docker volume rm extraction` |
| 189 | + - `docker volume rm extraction` |
| 190 | + - `docker volume rm extraction` |
| 191 | + |
| 192 | +## Execution without Docker |
75 | 193 |
|
76 | | -## Custom System Setup |
77 | 194 | If you want to run the ground truth extraction without Docker, you will have to first set up the environment in which |
78 | 195 | the extraction is executed. |
79 | 196 |
|
80 | | -### Limitations |
81 | | -There are some limitations to the ground truth extraction that should be mentioned. |
| 197 | +### Operating System |
82 | 198 |
|
83 | | -#### Operating System |
84 | | -Due to the implementation of the Ground Truth Extraction and KernelHaven, it is only possible to run the ground truth |
85 | | -extraction on Linux (and possibly Mac). However, you can use the provided Docker setup, or your own virtual machine or |
86 | | -Windows Subsystem for Linux, in order to run the extraction on any OS. |
| 199 | +Due to the implementation of the Ground Truth Extraction, it is only possible to run the ground truth extraction on |
| 200 | +Linux (and possibly Mac). However, you can use the provided Docker setup, or your own virtual machine or Windows |
| 201 | +Subsystem for Linux, in order to run the extraction on any OS. |
87 | 202 |
|
88 | | -### Requirements |
89 | | -TODO |
| 203 | +### Setup for Linux |
| 204 | + |
| 205 | +No special setup is required. |
90 | 206 |
|
91 | 207 | ### Setup Guide for Windows Subsystem for Linux (WSL) and Ubuntu |
| 208 | + |
92 | 209 | It is possible to use WSL to run the extraction on a Windows machine. |
93 | 210 |
|
94 | 211 | #### Installing WSL2 with Ubuntu 20 LTS |
| 212 | + |
95 | 213 | - Follow the guide at https://docs.microsoft.com/en-us/windows/wsl/install-win10 |
96 | | -- Using WSL2 is strongly recommended, because the extraction under WSL1 will take a lifetime. You can check which WSL you |
97 | | - have installed by following the instructions here https://askubuntu.com/questions/1177729/wsl-am-i-running-version-1-or-version-2 |
| 214 | +- Using WSL2 is strongly recommended, because the extraction under WSL1 will take a lifetime. You can check which WSL |
| 215 | + you |
| 216 | + have installed by following the instructions |
| 217 | + here https://askubuntu.com/questions/1177729/wsl-am-i-running-version-1-or-version-2 |
98 | 218 | - You can list the installed distributions with `wsl --list --verbose` |
99 | 219 | - Install Ubuntu 20 LTS via the Microsoft store |
100 | 220 |
|
101 | 221 | #### Problems with WSL |
| 222 | + |
102 | 223 | If you should encounter problems with the steps bellow, one of the following hints might help you |
| 224 | + |
103 | 225 | - Disable compression to avoid problems with starting WSL and/or network connection |
104 | | - -- Starting ubuntu -> solved by disabling compression of %USERPROFILE%\AppData\Local\Packages\CanonicalGroupLimited... directory |
| 226 | + -- Starting ubuntu -> solved by disabling compression of %USERPROFILE%\AppData\Local\Packages\CanonicalGroupLimited... |
| 227 | + directory |
105 | 228 | --Network connection -> solved by disabling the compression of the AppData/Local/Temp |
106 | | -- (otherwise some suggestions to solve network connectivity errors which seem to occur quite often: https://github.com/microsoft/WSL/issues/5336) |
107 | | -- Colliding paths when cloning linux repo (possibly only a problem when cloning into subdirectories of mnt) |
108 | | - -> enable case sensitivity (https://www.howtogeek.com/354220/how-to-enable-case-sensitive-folders-on-windows-10/) (https://devblogs.microsoft.com/commandline/per-directory-case-sensitivity-and-wsl/) |
109 | | - |
110 | | -#### Install required packages |
111 | | -Tip: You can enable copy-paste with Ctrl+Shift+C/V in WSL by |
112 | | -` |
113 | | -right-clicking on the top of the linux terminal window > properties > setting the corrensponding property under edit options (look at the center of the window) |
114 | | -` |
115 | | - |
116 | | -### Configuration |
117 | | -TODO |
| 229 | +- (otherwise some suggestions to solve network connectivity errors which seem to occur quite |
| 230 | + often: https://github.com/microsoft/WSL/issues/5336) |
| 231 | +- Colliding paths when cloning linux repo (possibly only a problem when cloning into subdirectories of mnt) |
| 232 | + -> enable case |
| 233 | + sensitivity (https://www.howtogeek.com/354220/how-to-enable-case-sensitive-folders-on-windows-10/) (https://devblogs.microsoft.com/commandline/per-directory-case-sensitivity-and-wsl/) |
| 234 | + |
| 235 | +### Build |
| 236 | +You can build a jar file for executing the extraction by calling Maven in the project's root directory: |
| 237 | +```shell |
| 238 | +mvn package |
| 239 | +``` |
| 240 | +Maven will build the jar file and save it to the [target](target) directory. The jar which you should use for execution is `Extraction-jar-with-dependencies.jar`. |
118 | 241 |
|
119 | | -### Validation |
120 | | -TODO |
| 242 | +### Execution |
| 243 | +The execution requires a configuration given by a properties file. An example of such a properties file can be found under [extraction.properties](src/main/resources/extraction.properties). You may customize these properties. |
121 | 244 |
|
122 | | -### Ground Truth Extraction |
123 | | -TODO |
| 245 | +After building the jar file, you can execute it using the following syntax: |
| 246 | +```shell |
| 247 | +# Either choose a 'fast' or a 'full' extraction |
| 248 | +java -jar Extraction-jar-with-dependencies.jar PATH_TO_YOUR_PROPERTIES (fast|full) |
| 249 | +``` |
0 commit comments