|
| 1 | +## SkyBench |
| 2 | + |
| 3 | +Version 1.1 |
| 4 | + |
| 5 | +© 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh |
| 6 | + |
| 7 | +------------------------------------------- |
| 8 | +### Table of Contents |
| 9 | + |
| 10 | + * [Introduction](#introduction) |
| 11 | + * [Algorithms](#algorithms) |
| 12 | + * [Datasets](#datasets) |
| 13 | + * [Requirements](#requirements) |
| 14 | + * [Usage](#usage) |
| 15 | + * [License](#license) |
| 16 | + * [Contact](#contact) |
| 17 | + * [References](#references) |
| 18 | + |
| 19 | + |
| 20 | +------------------------------------ |
| 21 | +### Introduction |
| 22 | + |
| 23 | +The *SkyBench* software suite contains software for efficient main-memory |
| 24 | +computation of skylines. The state-of-the-art sequential (i.e., single-threaded) and |
| 25 | +multi-core (i.e., multi-threaded) algorithms are included. |
| 26 | + |
| 27 | +[The skyline operator](https://en.wikipedia.org/wiki/Skyline_operator) [1] identifies |
| 28 | +so-called pareto-optimal points in a multi-dimensional dataset. In two dimensions, the |
| 29 | +problem is often presented as |
| 30 | +[finding the silhouette of Manhattan](http://stackoverflow.com/q/1066234/2769271): |
| 31 | +if one has knows the position of the corner points of every building, what parts of |
| 32 | +which buildings are visible from across the river? |
| 33 | +The two-dimensional case is trivial to solve and not the focus of *SkyBench*. |
| 34 | + |
| 35 | +In higher dimensions, the problem is formalised with the concept of _dominance_: a point |
| 36 | +_p_ is _dominated by_ another point _q_ if _q_ has better or equal values for every |
| 37 | +attribute and the points are distinct. All points that are not dominated are part of |
| 38 | +the skyline. For example, if the points correspond to hotels, then any hotel that is |
| 39 | +more expensive, farther from anything of interest, and lower-rated than another choice |
| 40 | +would _not_ be in the skyline. In the table below, _Marge's Hotel_ is dominated by |
| 41 | +_Happy Hostel_, because it is more expensive, farther from Central Station, and lower |
| 42 | +rated, so it is not in the skyline. On the other hand, _The Grand_ has the best rating |
| 43 | +and _Happy Hostel_ has the best price. _Lovely Lodge_ does not have the best value for |
| 44 | +any one attribute, but neither _The Grand_ nor _Happy Hostel_ outperform it on every |
| 45 | +attribute, so it too is in the skyline and represents a good _balance_ of the attributes. |
| 46 | + |
| 47 | + |
| 48 | +|Name |Price per Night|Rating|Distance to Central Station|In skyline?| |
| 49 | +|:------------|--------------:|:----:|:-------------------------:|:---------:| |
| 50 | +|The Grand | $325| ⋆⋆⋆⋆⋆| 1.2km| ✓| |
| 51 | +|Marge's Motel| $55| ⋆⋆| 3.6km| | |
| 52 | +|Happy Hostel | $25| ⋆⋆⋆| 0.4km| ✓| |
| 53 | +|Lovely Lodge | $100| ⋆⋆⋆⋆| 8.2km| ✓| |
| 54 | + |
| 55 | + |
| 56 | +As the number of dimensions/attributes increases, so too does the size of and difficulty |
| 57 | +in producing the skyline. Parallel algorithms, such as those implemented here, quickly |
| 58 | +become necessary. |
| 59 | + |
| 60 | +*SkyBench* is released in conjunction with our recent ICDE paper [2]. All of the |
| 61 | +code and scripts necessary to repeat experiments from that paper are available in |
| 62 | +this software suite. To the best of our knowledge, this is also the first publicly |
| 63 | +released C++ skyline software, which will hopefully be a useful resource for the |
| 64 | +academic and industry research communities. |
| 65 | + |
| 66 | + |
| 67 | +------------------------------------ |
| 68 | +### Algorithms |
| 69 | + |
| 70 | +The following algorithms have been implemented in SkyBench: |
| 71 | + |
| 72 | + * **Hybrid** [2]: Located in [src/hybrid](src/hybrid). |
| 73 | + It is the state-of-the-art multi-core algorithm, based on two-level |
| 74 | + quad-tree partitioning of the data and memoisation of point-to-point |
| 75 | + relationships. |
| 76 | + |
| 77 | + * **Q-Flow** [2]: Located in [src/qflow](src/qflow). |
| 78 | + It is a simplification of Hybrid to demonstrate control flow. |
| 79 | + |
| 80 | + * **PSkyline** [3]: Located in [src/pskyline](src/pskyline). |
| 81 | + It was the previous state-of-the-art multi-core algorithm, based |
| 82 | + on a divide-and-conquer paradigm. |
| 83 | + |
| 84 | + * **BSkyTree** [4]: Located in [src/bskytree](src/bskytree). |
| 85 | + It is the state-of-the-art sequential algorithm, based on a |
| 86 | + quad-tree partitioning of the data and memoisation of point-to-point |
| 87 | + relationships. |
| 88 | + |
| 89 | +All four algorithms are implementations of the common interface defined in |
| 90 | +[common/skyline_i.h](common/skyline_i.h) and use common dominance tests from |
| 91 | +[common/common.h](common/common.h) and [common/dt_avx.h](common/dt_avx.h) |
| 92 | +(the latter when vectorisation is enabled). |
| 93 | + |
| 94 | +------------------------------------ |
| 95 | +### Datasets |
| 96 | + |
| 97 | +For reproducibility of the experiments in [2], we include three datasets. |
| 98 | +The [WEATHER](workloads/elv_weather-U-15-566268.csv) dataset was originally obtained from |
| 99 | +[The University of East Anglia Climatic Research Unit](http://www.cru.uea.ac.uk/cru/data/hrg/tmc) |
| 100 | +and preprocessed for skyline computation. |
| 101 | +We also include two classic skyline datasets, exactly as used in [2]: |
| 102 | +[NBA](workloads/nba-U-8-17264.csv) and |
| 103 | +[HOUSE](workloads/house-U-6-127931.csv). |
| 104 | + |
| 105 | +The synthetic workloads can be generated with the standard benchmark skyline |
| 106 | +data generator [1] hosted on |
| 107 | +[pgfoundry](http://pgfoundry.org/projects/randdataset). |
| 108 | + |
| 109 | + |
| 110 | +------------------------------------ |
| 111 | +### Requirements |
| 112 | + |
| 113 | +*SkyBench* depends on the following applications: |
| 114 | + |
| 115 | + * A C++ compiler that supports C++11 and OpenMP (e.g., the newest |
| 116 | + [GNU compiler](https://gcc.gnu.org/)) |
| 117 | + |
| 118 | + * The GNU `make` program |
| 119 | + |
| 120 | + * AVX or AVX2 if vectorised dominance tests are to be used |
| 121 | + |
| 122 | + |
| 123 | +------------------------------------ |
| 124 | +### Usage |
| 125 | + |
| 126 | +To run, the code needs to be compiled with the given number of dimensions.^ |
| 127 | +For example, to compute the skyline of the 8-dimensional NBA data set located |
| 128 | +in `workloads/nba-U-8-17264.csv`, do: |
| 129 | + |
| 130 | +> make all DIMS=8 |
| 131 | +> |
| 132 | +> ./bin/SkyBench -f workloads/nba-U-8-17264.csv |
| 133 | +
|
| 134 | +By default, it will compute the skyline with all algorithms. Running `./bin/SkyBench` |
| 135 | +without parameters will provide more details about the supported options. |
| 136 | + |
| 137 | +You can make use of the provided shell script (`/script/runExp.sh`) that does all of |
| 138 | +the above automatically. For details, execute: |
| 139 | +> ./script/runExp.sh |
| 140 | +
|
| 141 | +To reproduce the experiment with real datasets (Table II in [2]), do (assuming |
| 142 | +a 16-core machine): |
| 143 | +> ./scripts/realTest.sh 16 T "bskytree pbskytree pskyline qflow hybrid" |
| 144 | +
|
| 145 | +^For performance reasons, skyline implementations that we obtained from other |
| 146 | +authors compile their code for a specific number of dimensions. For a fair |
| 147 | +comparison, we adopted the same approach. |
| 148 | + |
| 149 | + |
| 150 | +------------------------------------ |
| 151 | +### License |
| 152 | + |
| 153 | +This software is subject to the terms of |
| 154 | +[The MIT License](http://opensource.org/licenses/MIT), |
| 155 | +which [has been included in this repository](LICENSE.md). |
| 156 | + |
| 157 | + |
| 158 | +------------------------------------ |
| 159 | +### Contact |
| 160 | + |
| 161 | +This software suite will be expanded soon with new algorithms; so, you are |
| 162 | +encouraged to ensure that this is still the latest version. Please do not |
| 163 | +hesitate to contact the authors if you have comments, questions, or bugs to report. |
| 164 | +>[SkyBench on GitHub](https://github.com/sean-chester/SkyBench) |
| 165 | +
|
| 166 | + |
| 167 | +------------------------------------ |
| 168 | +### References |
| 169 | + |
| 170 | + 1. |
| 171 | +S. Börzsönyi, D. Kossmann, and K. Stocker. |
| 172 | +(2001) |
| 173 | +"The Skyline Operator." |
| 174 | +In _Proceedings of the 17th International Conference on Data Engineering (ICDE 2001)_, |
| 175 | +421--432. |
| 176 | +http://infolab.usc.edu/csci599/Fall2007/papers/e-1.pdf |
| 177 | + |
| 178 | + 2. |
| 179 | +S. Chester, D. Šidlauskas, I Assent, and K. S. Bøgh. |
| 180 | +(2015) |
| 181 | +"Scalable parallelization of skyline computation for multi-core processors." |
| 182 | +In _Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE 2015)_, |
| 183 | +1083--1094. |
| 184 | +http://cs.au.dk/~schester/publications/chester_icde2015_mcsky.pdf |
| 185 | + |
| 186 | + 3. |
| 187 | +H. Im, J. Park, and S. Park. |
| 188 | +(2011) |
| 189 | +"Parallel skyline computation on multicore architectures." |
| 190 | +_Information Systems_ 36(4): |
| 191 | +808--823. |
| 192 | +http://dx.doi.org/10.1016/j.is.2010.10.005 |
| 193 | + |
| 194 | + 4. |
| 195 | +J. Lee and S. Hwang. |
| 196 | +(2014) |
| 197 | +"Scalable skyline computation using a balanced pivot selection technique." |
| 198 | +_Information Systems_ 39: |
| 199 | +1--21. |
| 200 | +http://dx.doi.org/10.1016/j.is.2013.05.005 |
| 201 | + |
| 202 | +------------------------------------ |
0 commit comments