Skip to content

Commit 6c93b92

Browse files
committed
Initial release of SkyBench software suite.
0 parents  commit 6c93b92

35 files changed

+715950
-0
lines changed

LICENSE.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
**The MIT License (MIT)**
2+
3+
Copyright (c) 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy of this
6+
software and associated documentation files (the "Software"), to deal in the Software
7+
without restriction, including without limitation the rights to use, copy, modify,
8+
merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
9+
permit persons to whom the Software is furnished to do so, subject to the following
10+
conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all copies
13+
or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
16+
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
17+
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
18+
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
20+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
## SkyBench
2+
3+
Version 1.1
4+
5+
© 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh
6+
7+
-------------------------------------------
8+
### Table of Contents
9+
10+
* [Introduction](#introduction)
11+
* [Algorithms](#algorithms)
12+
* [Datasets](#datasets)
13+
* [Requirements](#requirements)
14+
* [Usage](#usage)
15+
* [License](#license)
16+
* [Contact](#contact)
17+
* [References](#references)
18+
19+
20+
------------------------------------
21+
### Introduction
22+
23+
The *SkyBench* software suite contains software for efficient main-memory
24+
computation of skylines. The state-of-the-art sequential (i.e., single-threaded) and
25+
multi-core (i.e., multi-threaded) algorithms are included.
26+
27+
[The skyline operator](https://en.wikipedia.org/wiki/Skyline_operator) [1] identifies
28+
so-called pareto-optimal points in a multi-dimensional dataset. In two dimensions, the
29+
problem is often presented as
30+
[finding the silhouette of Manhattan](http://stackoverflow.com/q/1066234/2769271):
31+
if one has knows the position of the corner points of every building, what parts of
32+
which buildings are visible from across the river?
33+
The two-dimensional case is trivial to solve and not the focus of *SkyBench*.
34+
35+
In higher dimensions, the problem is formalised with the concept of _dominance_: a point
36+
_p_ is _dominated by_ another point _q_ if _q_ has better or equal values for every
37+
attribute and the points are distinct. All points that are not dominated are part of
38+
the skyline. For example, if the points correspond to hotels, then any hotel that is
39+
more expensive, farther from anything of interest, and lower-rated than another choice
40+
would _not_ be in the skyline. In the table below, _Marge's Hotel_ is dominated by
41+
_Happy Hostel_, because it is more expensive, farther from Central Station, and lower
42+
rated, so it is not in the skyline. On the other hand, _The Grand_ has the best rating
43+
and _Happy Hostel_ has the best price. _Lovely Lodge_ does not have the best value for
44+
any one attribute, but neither _The Grand_ nor _Happy Hostel_ outperform it on every
45+
attribute, so it too is in the skyline and represents a good _balance_ of the attributes.
46+
47+
48+
|Name |Price per Night|Rating|Distance to Central Station|In skyline?|
49+
|:------------|--------------:|:----:|:-------------------------:|:---------:|
50+
|The Grand | $325| ⋆⋆⋆⋆⋆| 1.2km||
51+
|Marge's Motel| $55| ⋆⋆| 3.6km| |
52+
|Happy Hostel | $25| ⋆⋆⋆| 0.4km||
53+
|Lovely Lodge | $100| ⋆⋆⋆⋆| 8.2km||
54+
55+
56+
As the number of dimensions/attributes increases, so too does the size of and difficulty
57+
in producing the skyline. Parallel algorithms, such as those implemented here, quickly
58+
become necessary.
59+
60+
*SkyBench* is released in conjunction with our recent ICDE paper [2]. All of the
61+
code and scripts necessary to repeat experiments from that paper are available in
62+
this software suite. To the best of our knowledge, this is also the first publicly
63+
released C++ skyline software, which will hopefully be a useful resource for the
64+
academic and industry research communities.
65+
66+
67+
------------------------------------
68+
### Algorithms
69+
70+
The following algorithms have been implemented in SkyBench:
71+
72+
* **Hybrid** [2]: Located in [src/hybrid](src/hybrid).
73+
It is the state-of-the-art multi-core algorithm, based on two-level
74+
quad-tree partitioning of the data and memoisation of point-to-point
75+
relationships.
76+
77+
* **Q-Flow** [2]: Located in [src/qflow](src/qflow).
78+
It is a simplification of Hybrid to demonstrate control flow.
79+
80+
* **PSkyline** [3]: Located in [src/pskyline](src/pskyline).
81+
It was the previous state-of-the-art multi-core algorithm, based
82+
on a divide-and-conquer paradigm.
83+
84+
* **BSkyTree** [4]: Located in [src/bskytree](src/bskytree).
85+
It is the state-of-the-art sequential algorithm, based on a
86+
quad-tree partitioning of the data and memoisation of point-to-point
87+
relationships.
88+
89+
All four algorithms are implementations of the common interface defined in
90+
[common/skyline_i.h](common/skyline_i.h) and use common dominance tests from
91+
[common/common.h](common/common.h) and [common/dt_avx.h](common/dt_avx.h)
92+
(the latter when vectorisation is enabled).
93+
94+
------------------------------------
95+
### Datasets
96+
97+
For reproducibility of the experiments in [2], we include three datasets.
98+
The [WEATHER](workloads/elv_weather-U-15-566268.csv) dataset was originally obtained from
99+
[The University of East Anglia Climatic Research Unit](http://www.cru.uea.ac.uk/cru/data/hrg/tmc)
100+
and preprocessed for skyline computation.
101+
We also include two classic skyline datasets, exactly as used in [2]:
102+
[NBA](workloads/nba-U-8-17264.csv) and
103+
[HOUSE](workloads/house-U-6-127931.csv).
104+
105+
The synthetic workloads can be generated with the standard benchmark skyline
106+
data generator [1] hosted on
107+
[pgfoundry](http://pgfoundry.org/projects/randdataset).
108+
109+
110+
------------------------------------
111+
### Requirements
112+
113+
*SkyBench* depends on the following applications:
114+
115+
* A C++ compiler that supports C++11 and OpenMP (e.g., the newest
116+
[GNU compiler](https://gcc.gnu.org/))
117+
118+
* The GNU `make` program
119+
120+
* AVX or AVX2 if vectorised dominance tests are to be used
121+
122+
123+
------------------------------------
124+
### Usage
125+
126+
To run, the code needs to be compiled with the given number of dimensions.^
127+
For example, to compute the skyline of the 8-dimensional NBA data set located
128+
in `workloads/nba-U-8-17264.csv`, do:
129+
130+
> make all DIMS=8
131+
>
132+
> ./bin/SkyBench -f workloads/nba-U-8-17264.csv
133+
134+
By default, it will compute the skyline with all algorithms. Running `./bin/SkyBench`
135+
without parameters will provide more details about the supported options.
136+
137+
You can make use of the provided shell script (`/script/runExp.sh`) that does all of
138+
the above automatically. For details, execute:
139+
> ./script/runExp.sh
140+
141+
To reproduce the experiment with real datasets (Table II in [2]), do (assuming
142+
a 16-core machine):
143+
> ./scripts/realTest.sh 16 T "bskytree pbskytree pskyline qflow hybrid"
144+
145+
^For performance reasons, skyline implementations that we obtained from other
146+
authors compile their code for a specific number of dimensions. For a fair
147+
comparison, we adopted the same approach.
148+
149+
150+
------------------------------------
151+
### License
152+
153+
This software is subject to the terms of
154+
[The MIT License](http://opensource.org/licenses/MIT),
155+
which [has been included in this repository](LICENSE.md).
156+
157+
158+
------------------------------------
159+
### Contact
160+
161+
This software suite will be expanded soon with new algorithms; so, you are
162+
encouraged to ensure that this is still the latest version. Please do not
163+
hesitate to contact the authors if you have comments, questions, or bugs to report.
164+
>[SkyBench on GitHub](https://github.com/sean-chester/SkyBench)
165+
166+
167+
------------------------------------
168+
### References
169+
170+
1.
171+
S. Börzsönyi, D. Kossmann, and K. Stocker.
172+
(2001)
173+
"The Skyline Operator."
174+
In _Proceedings of the 17th International Conference on Data Engineering (ICDE 2001)_,
175+
421--432.
176+
http://infolab.usc.edu/csci599/Fall2007/papers/e-1.pdf
177+
178+
2.
179+
S. Chester, D. Šidlauskas, I Assent, and K. S. Bøgh.
180+
(2015)
181+
"Scalable parallelization of skyline computation for multi-core processors."
182+
In _Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE 2015)_,
183+
1083--1094.
184+
http://cs.au.dk/~schester/publications/chester_icde2015_mcsky.pdf
185+
186+
3.
187+
H. Im, J. Park, and S. Park.
188+
(2011)
189+
"Parallel skyline computation on multicore architectures."
190+
_Information Systems_ 36(4):
191+
808--823.
192+
http://dx.doi.org/10.1016/j.is.2010.10.005
193+
194+
4.
195+
J. Lee and S. Hwang.
196+
(2014)
197+
"Scalable skyline computation using a balanced pivot selection technique."
198+
_Information Systems_ 39:
199+
1--21.
200+
http://dx.doi.org/10.1016/j.is.2013.05.005
201+
202+
------------------------------------

makefile

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
############################################################
2+
# Makefile for Benchmarking Skyline Algorithms #
3+
# Darius Sidlauskas (darius.sidlauskas@epfl.ch) #
4+
# Sean Chester (schester@cs.au.dk) #
5+
# Copyright (c) 2014 Aarhus University #
6+
############################################################
7+
8+
RM = rm -rf
9+
MV = mv
10+
CP = cp -rf
11+
CC = g++
12+
13+
TARGET = $(OUT)/SkyBench
14+
15+
SRC = $(wildcard src/util/*.cpp) \
16+
$(wildcard src/common/*.cpp) \
17+
$(wildcard src/bskytree/*.cpp) \
18+
$(wildcard src/pskyline/*.cpp) \
19+
$(wildcard src/qflow/*.cpp) \
20+
$(wildcard src/hybrid/*.cpp) \
21+
$(wildcard src/*.cpp)
22+
23+
OBJ = $(addprefix $(OUT)/,$(notdir $(SRC:.cpp=.o)))
24+
25+
OUT = bin
26+
27+
LIB_DIR = # used as -L$(LIB_DIR)
28+
INCLUDES = -I ./src/
29+
30+
LIB =
31+
32+
# Forces make to look these directories
33+
VPATH = src:src/util:src/bskytree:src/pskyline:src/qflow:src/hybrid:src/common
34+
35+
DIMS=6
36+
V=VERBOSE
37+
DT=0
38+
PROFILER=0
39+
40+
# By default compiling for performance (optimal)
41+
CXXFLAGS = -O3 -m64 -DNDEBUG\
42+
-DNUM_DIMS=$(DIMS) -D$(V) -DCOUNT_DT=$(DT) -DPROFILER=$(PROFILER)\
43+
-Wno-deprecated -Wno-write-strings -nostdlib -Wpointer-arith \
44+
-Wcast-qual -Wcast-align \
45+
-std=c++0x -fopenmp -mavx
46+
47+
LDFLAGS=-m64 -lrt -fopenmp
48+
49+
# Target-specific Variable values:
50+
# Compile for debugging (works with valgrind)
51+
dbg : CXXFLAGS = -O0 -g3 -m64\
52+
-DNUM_DIMS=$(DIMS) -DVERBOSE -DCOUNT_DT=0 -DPROFILER=1\
53+
-Wno-deprecated -Wno-write-strings -nostdlib -Wpointer-arith \
54+
-Wcast-qual -Wcast-align -std=c++0x
55+
dbg : all
56+
57+
# All Target
58+
all: $(TARGET)
59+
60+
# Tool invocations
61+
$(TARGET): $(OBJ) $(LIB_DIR)$(LIB)
62+
@echo 'Building target: $@ (GCC C++ Linker)'
63+
$(CC) -o $(TARGET) $(OBJ) $(LDFLAGS)
64+
@echo 'Finished building target: $@'
65+
@echo ' '
66+
67+
$(OUT)/%.o: %.cpp
68+
@echo 'Building file: $< (GCC C++ Compiler)'
69+
$(CC) $(CXXFLAGS) $(INCLUDES) -c -o"$@" "$<"
70+
@echo 'Finished building: $<'
71+
@echo ' '
72+
73+
clean:
74+
-$(RM) $(OBJ) $(TARGET) $(addprefix $(OUT)/,$(notdir $(SRC:.cpp=.d)))
75+
-@echo ' '
76+
77+
deepclean:
78+
-$(RM) bin/*
79+
-@echo ' '
80+
81+
82+
.PHONY: all clean deepclean dbg tests

scripts/README

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Note that these scripts are setup to be run from the parent directory and correspond to
2+
the scripts used to generate the results in our ICDE 2015 paper.

scripts/cardTest.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
#!/bin/bash
2+
3+
./runExp.sh -p -i ./workloads -t "1 2 4 8" -x "C E A" -d 12\
4+
-c "500000 1000000 2000000 4000000 8000000"\
5+
-s "bskytree hybrid"

scripts/dimTest.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
3+
./runExp.sh -p -i ./workloads -t "1 2 4 8" -x "C E A" -c 1000000\
4+
-d "2 4 6 8 10 12 14 16 18 20 22 24"\
5+
-s "bskytree hybrid"
6+

0 commit comments

Comments
 (0)