Skip to content

Commit 9f78688

Browse files
docs: update README
1 parent ca20e76 commit 9f78688

File tree

3 files changed

+184
-58
lines changed

3 files changed

+184
-58
lines changed

README.md

Lines changed: 182 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,123 +1,249 @@
1-
# VEVOS: Ground Truth Extraction
2-
VEVOS is a tool suite for the simulation of the evolution of clone-and-own projects and consists of two main components: The ground truth extraction, called VEVOS/Extraction and the variant simulation called VEVOS/Simulation.
1+
# VEVOS: Ground Truth Extraction v2.0.0
32

4-
This repository contains VEVOS/Extraction.
5-
Please refer to our paper _Simulating the Evolution of Clone-and-Own Projects with VEVOS_ published at the International Conference on Evaluation and Assessment in Software Engineering (EASE) 2022 ([doi](https://doi.org/10.1145/3530019.3534084)) for more information.
6-
VEVOS/Extraction is a Java project for extracting feature mappings, presence conditions, and feature models for each revision (within a specified range of the commit-history) from an input software product line.
3+
VEVOS is a tool suite for the simulation of the evolution of clone-and-own projects and consists of two main components:
4+
The ground truth extraction, called VEVOS/Extraction and the variant simulation called VEVOS/Simulation.
5+
6+
This repository contains VEVOS/Extraction.
7+
Please refer to our paper _Simulating the Evolution of Clone-and-Own Projects with VEVOS_ published at the International
8+
Conference on Evaluation and Assessment in Software Engineering (EASE)
9+
2022 ([doi](https://doi.org/10.1145/3530019.3534084)) for more information.
10+
VEVOS/Extraction is a Java project for extracting feature mappings, presence conditions, and feature models for each
11+
commit from an input software product line.
12+
13+
## Version 2.x.x Update
14+
15+
### Improvements
16+
17+
Version 2.0.0 of VEVOS Extraction presents a major improvement over version 1.0.0 in terms of extractable product lines,
18+
commit coverage, and extraction speed.
19+
VEVOS is now based on [DiffDetective](https://github.com/VariantSync/DiffDetective) a library for analyses of edits to
20+
preprocessor-based product lines.
21+
Due to this major change, VEVOS can now extract a ground truth for any C preprocessor-based software product line and is
22+
no longer bound to the availability of a special adaptor.
23+
For this reason, VEVOS in its default configuration extracts a ground truth for the 43 product lines listed in
24+
the [without_linux](docker-resources/without_linux.md) dataset file.
25+
26+
### Shortcomings
27+
28+
However, there is also a drawback of the improved extraction. VEVOS 2.0.0 is not capable of extracting a feature model
29+
or the presence conditions of entire source code files that are defined by additional build files. If these are still
30+
required, VEVOS 1.x.x has to be used.
31+
32+
### Extraction Modes
33+
34+
There are two basic extraction modes: `fast` and `full`.
35+
36+
#### Fast Extraction
37+
38+
The fast ground truth extraction only extracts the ground truths of changed files for each commit. This extraction is
39+
very useful for studies that are only interested in the evolution of a software family.
40+
41+
#### Full Extraction
42+
43+
The full ground truth extraction extracts the ground truth for all code files of all commits in a product line. Due to
44+
the effort of extracting and saving a ground truth for all files of each commit, this extraction may require a very long
45+
time and large amounts of free disk space.
46+
47+
Essentially, the full ground truth extraction first performs a fast ground truth extraction and then incrementally
48+
combines the ground truths of all commits.
749

850
## Quick Start using Docker
9-
In the following, we provide instructions on how to quickly extract the ground truth of Linux or Busybox with the provided Docker setup.
51+
52+
In the following, we provide instructions on how to quickly extract a ground truth for any preprocessor-based product
53+
line using Docker.
1054

1155
### Requirements
56+
1257
The only requirement is Docker. We provide batch and bash scripts that execute the necessary Docker setup and execution.
13-
We tested the Docker setup under Windows and Linux.
58+
We tested the Docker setup under Windows and Linux.
1459
We have not tested the setup on Mac, but you should be able to use the instructions for Linux.
1560

1661
### Preparation
62+
1763
#### Docker
64+
1865
Docker must be installed on your system, and the Docker daemon must be running.
1966
For installation, follow the instructions given in the installation guide for your OS which you can find
2067
[here](https://docs.docker.com/get-docker/).
2168
Under Linux, you should follow the optional
2269
[post-installation instructions](https://docs.docker.com/engine/install/linux-postinstall/).
2370

2471
#### Repository
72+
2573
Clone the repository to a location of your choice
74+
2675
```
2776
git clone https://github.com/VariantSync/VEVOS_Extraction.git
2877
```
78+
2979
Then, navigate to the repository's root directory in a terminal of your choice.
3080

81+
### (Optional) Configure the Extraction
82+
83+
You can customize the extraction by changing one of the three integrated configurations for Docker.
84+
Each configuration has two files associated with it: a properties file that configures VEVOS itself, and a dataset file
85+
that specifies the product lines for extraction.
86+
87+
#### Config 1: without_linux
88+
89+
[without_linux](docker-resources/without_linux.properties) is the default configuration for extracting a ground truth
90+
for 43 product lines specified in the corresponding [dataset file](docker-resources/without_linux.md).
91+
92+
#### Config 2: verification
93+
94+
[verification](docker-resources/verification.properties) is the default configuration for extracting a ground truth for
95+
BusyBox [dataset file](docker-resources/verification.md). This configuration can be used for verification purposes.
96+
97+
#### Config 3: custom
98+
99+
[custom](docker-resources/custom.properties) is the default configuration for extracting a ground truth for any other
100+
preprocessor-based product line. The desired product line should be specified
101+
in [dataset file](docker-resources/custom.md).
102+
103+
> Note that configurations are part of the Docker image. Any configuration changes only takes effect after rebuilding
104+
> the image as described below.
105+
31106
### Build the Docker Image
32-
Before the extraction can be executed, we have to build the Docker image. This can be done by executing the corresponding build script in a terminal.
107+
108+
Before the extraction can be executed, we have to build the Docker image. This can be done by executing the
109+
corresponding build script in a terminal.
33110

34111
- Linux terminal: `./build-docker-image.sh`
35112
- Windows CMD: `build-docker-image.bat`
36113

37-
This process may roughly take half an hour.
114+
This process may take a couple of minutes.
38115

39116
### Start the Ground Truth Extraction in a Docker Container
40-
We provide bash and batch scripts that start the ground truth extraction and copy all data to
41-
_Extraction/extraction-results_ once the extraction is complete, or has been stopped.
117+
118+
We provide bash and batch scripts that start the ground truth extraction. The extracted ground truth is written to
119+
the [ground-truth](ground-truth) directory.
42120
Start the extraction by executing the `start-extraction` script (see examples further below).
43-
The basic syntax is `start-extraction.(sh|bat)`:
121+
The basic syntax is `start-extraction.(sh|bat) [(verification|custom)] (fast|full)`:
122+
123+
#### Example 1: Start a fast ground truth extraction for `without_linux`
124+
125+
- Windows CMD:
126+
- `start-extraction.bat fast`
127+
- Linux terminal:
128+
- `./start-extraction.sh fast`
129+
130+
#### Example 2: Start a full ground truth extraction for `without_linux`
44131

45132
- Windows CMD:
46-
- `start-extraction.bat`
133+
- `start-extraction.bat full`
47134
- Linux terminal:
48-
- `./start-extraction.sh`
135+
- `./start-extraction.sh full`
136+
137+
#### Example 3: Start a fast ground truth extraction for `verification`
138+
139+
- Windows CMD:
140+
- `start-extraction.bat verification fast`
141+
- Linux terminal:
142+
- `./start-extraction.sh verification fast`
143+
144+
#### Example 4: Start a full ground truth extraction for `custom`
145+
146+
- Windows CMD:
147+
- `start-extraction.bat custom full`
148+
- Linux terminal:
149+
- `./start-extraction.sh custom full`
49150

50151
#### Runtime
51-
TODO
152+
153+
It is difficult to estimate the runtime as it largely depends on the product line's history size and complexity of
154+
annotations. To provide some examples:
155+
156+
- A fast ground truth extraction of BusyBox requires 2-15 minutes depending on the machine
157+
- A full ground truth extraction of BusyBox requires 1-3 hours depending on the machine
158+
- A fast ground truth extraction of the 43 product lines specified in [without_linux](docker-resources/without_linux.md)
159+
requires 1-7 days depending on the machine
52160

53161
### Stopping the Ground Truth Extraction
54-
You can stop the Docker container in which the ground truth extraction is running at any time. In this case, all
55-
collected data will be copied to _Extraction/extraction-results/_ as if the extraction finished successfully.
162+
163+
You can stop the Docker container in which the ground truth extraction is running at any time.
56164

57165
- Windows CMD:
58-
- `stop-extraction.bat`
166+
- `stop-extraction.bat`
59167
- Linux terminal:
60-
- `./stop-extraction.sh`
168+
- `./stop-extraction.sh`
61169

62170
### Custom Configuration
63-
You can find the properties files used by Docker under Extraction/docker-resources. By changing the properties, you can adjust the ground truth extraction (e.g., change the log level,
64-
number of threads, etc.). For your convenience, we set all properties to default values. __Note that you have to rebuild the Docker image in order for the changes to take effect__.
65171

66-
### Clean-Up
67-
You can clean up all created images, container, and volumes via `docker system prune -a`. __DISCLAIMER: This will remove ALL docker objects, even the ones not related to ground truth extraction__. If you have other images, containers, or volumes that you do not want to loose, you can run the docker commands that refer to the objects related to the ground truth extraction.
172+
You can find the properties files used by Docker under Extraction/docker-resources. By changing the properties, you can
173+
adjust the ground truth extraction (e.g., change the log level,
174+
number of threads, etc.). For your convenience, we set all properties to default values. __Note that you have to rebuild
175+
the Docker image in order for the changes to take effect__.
176+
177+
### Clean-Up
178+
179+
You can clean up all created images, container, and volumes via `docker system prune -a`. __DISCLAIMER: This will remove
180+
ALL docker objects, even the ones not related to ground truth extraction__. If you have other images, containers, or
181+
volumes that you do not want to loose, you can run the docker commands that refer to the objects related to the ground
182+
truth extraction.
183+
68184
- Image: `docker rmi extraction`
69-
- Container:
70-
- `docker container rm extraction`
71-
- `docker container rm extraction`
185+
- Container:
186+
- `docker container rm extraction`
187+
- `docker container rm extraction`
72188
- Volume:
73-
- `docker volume rm extraction`
74-
- `docker volume rm extraction`
189+
- `docker volume rm extraction`
190+
- `docker volume rm extraction`
191+
192+
## Execution without Docker
75193

76-
## Custom System Setup
77194
If you want to run the ground truth extraction without Docker, you will have to first set up the environment in which
78195
the extraction is executed.
79196

80-
### Limitations
81-
There are some limitations to the ground truth extraction that should be mentioned.
197+
### Operating System
82198

83-
#### Operating System
84-
Due to the implementation of the Ground Truth Extraction and KernelHaven, it is only possible to run the ground truth
85-
extraction on Linux (and possibly Mac). However, you can use the provided Docker setup, or your own virtual machine or
86-
Windows Subsystem for Linux, in order to run the extraction on any OS.
199+
Due to the implementation of the Ground Truth Extraction, it is only possible to run the ground truth extraction on
200+
Linux (and possibly Mac). However, you can use the provided Docker setup, or your own virtual machine or Windows
201+
Subsystem for Linux, in order to run the extraction on any OS.
87202

88-
### Requirements
89-
TODO
203+
### Setup for Linux
204+
205+
No special setup is required.
90206

91207
### Setup Guide for Windows Subsystem for Linux (WSL) and Ubuntu
208+
92209
It is possible to use WSL to run the extraction on a Windows machine.
93210

94211
#### Installing WSL2 with Ubuntu 20 LTS
212+
95213
- Follow the guide at https://docs.microsoft.com/en-us/windows/wsl/install-win10
96-
- Using WSL2 is strongly recommended, because the extraction under WSL1 will take a lifetime. You can check which WSL you
97-
have installed by following the instructions here https://askubuntu.com/questions/1177729/wsl-am-i-running-version-1-or-version-2
214+
- Using WSL2 is strongly recommended, because the extraction under WSL1 will take a lifetime. You can check which WSL
215+
you
216+
have installed by following the instructions
217+
here https://askubuntu.com/questions/1177729/wsl-am-i-running-version-1-or-version-2
98218
- You can list the installed distributions with `wsl --list --verbose`
99219
- Install Ubuntu 20 LTS via the Microsoft store
100220

101221
#### Problems with WSL
222+
102223
If you should encounter problems with the steps bellow, one of the following hints might help you
224+
103225
- Disable compression to avoid problems with starting WSL and/or network connection
104-
-- Starting ubuntu -> solved by disabling compression of %USERPROFILE%\AppData\Local\Packages\CanonicalGroupLimited... directory
226+
-- Starting ubuntu -> solved by disabling compression of %USERPROFILE%\AppData\Local\Packages\CanonicalGroupLimited...
227+
directory
105228
--Network connection -> solved by disabling the compression of the AppData/Local/Temp
106-
- (otherwise some suggestions to solve network connectivity errors which seem to occur quite often: https://github.com/microsoft/WSL/issues/5336)
107-
- Colliding paths when cloning linux repo (possibly only a problem when cloning into subdirectories of mnt)
108-
-> enable case sensitivity (https://www.howtogeek.com/354220/how-to-enable-case-sensitive-folders-on-windows-10/) (https://devblogs.microsoft.com/commandline/per-directory-case-sensitivity-and-wsl/)
109-
110-
#### Install required packages
111-
Tip: You can enable copy-paste with Ctrl+Shift+C/V in WSL by
112-
`
113-
right-clicking on the top of the linux terminal window > properties > setting the corrensponding property under edit options (look at the center of the window)
114-
`
115-
116-
### Configuration
117-
TODO
229+
- (otherwise some suggestions to solve network connectivity errors which seem to occur quite
230+
often: https://github.com/microsoft/WSL/issues/5336)
231+
- Colliding paths when cloning linux repo (possibly only a problem when cloning into subdirectories of mnt)
232+
-> enable case
233+
sensitivity (https://www.howtogeek.com/354220/how-to-enable-case-sensitive-folders-on-windows-10/) (https://devblogs.microsoft.com/commandline/per-directory-case-sensitivity-and-wsl/)
234+
235+
### Build
236+
You can build a jar file for executing the extraction by calling Maven in the project's root directory:
237+
```shell
238+
mvn package
239+
```
240+
Maven will build the jar file and save it to the [target](target) directory. The jar which you should use for execution is `Extraction-jar-with-dependencies.jar`.
118241

119-
### Validation
120-
TODO
242+
### Execution
243+
The execution requires a configuration given by a properties file. An example of such a properties file can be found under [extraction.properties](src/main/resources/extraction.properties). You may customize these properties.
121244

122-
### Ground Truth Extraction
123-
TODO
245+
After building the jar file, you can execute it using the following syntax:
246+
```shell
247+
# Either choose a 'fast' or a 'full' extraction
248+
java -jar Extraction-jar-with-dependencies.jar PATH_TO_YOUR_PROPERTIES (fast|full)
249+
```

docker-resources/custom.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,6 @@ diff-detective.repo-storage-dir=ground-truth/REPOS
2121
# Path to which the ground truth is saved. Do NOT change this without knowing how this affects the Docker file system interface
2222
extraction.gt-save-dir=ground-truth
2323
# Number of threads to use
24-
diff-detective.num-threads=128
24+
diff-detective.num-threads=1
2525
# Number of commits to process in a single batch by one thread
2626
diff-detective.batch-size=8

docker-resources/without_linux.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,6 @@ diff-detective.repo-storage-dir=ground-truth/REPOS
1717
# Path to which the ground truth is saved. Do NOT change this without knowing how this affects the Docker file system interface
1818
extraction.gt-save-dir=ground-truth
1919
# Number of threads to use
20-
diff-detective.num-threads=128
20+
diff-detective.num-threads=32
2121
# Number of commits to process in a single batch by one thread
2222
diff-detective.batch-size=8

0 commit comments

Comments
 (0)