ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.
This supplemental guide explains how the data generated for TSBS is stored, additional flags available when using the data importer (tsbs_load_clickhouse),
and additional flags available for the query runner (tsbs_run_queries_clickhouse).
This should be read after the main README.
Data generated by tsbs_generate_data for ClickHouse is serialized in a "pseudo-CSV" format,
along with a custom header at the beginning. The header is several lines long:
- one line composed of a comma-separated list of tag labels, with the literal string
tagsas the first value in the list - one or more lines composed of a comma-separated list of field labels, with the hypertable name as the first value in the list
- a blank line
An example for the cpu-only use case:
tags,hostname,region,datacenter,rack,os,arch,team,service,service_version,service_environment
cpu,usage_user,usage_system,usage_idle,usage_nice,usage_iowait,usage_irq,usage_softirq,usage_steal,usage_guest,usage_guest_nice
Following this, each reading is composed of two rows:
- a comma-separated list of tag values for the reading, with the literal string
tagsas the first value in the list - a comma-separated list of field values for the reading, with the hypertable the reading belongs to being the first value and the timestamp as the second value
An example for the cpu-only use case:
tags,host_0,eu-central-1,eu-central-1b,21,Ubuntu15.10,x86,SF,6,0,test
cpu,1451606400000000000,58.1317132304976170,2.6224297271376256,24.9969495069947882,61.5854484633778867,22.9481393231639395,63.6499207106198313,6.4098777048301052,44.8799140503027445,80.5028770761136201,38.2431182911542820
Hostname of the ClickHouse server.
User to use to connect to the ClickHouse server. Yes, default user is really called default
Password to use to connect to the ClickHouse server. Default password is empty
Whether to consistently hash data across the multiple insert workers by the value of the primary (first) tag. For datasets with larger numbers of devices, this option helps improve data locality on disk which can lead to better query performance. For datasets with smaller numbers of devices, it is typically not necessary.
File to output periodic CPU and memory statistics. Useful for understanding system performance while writing data to the database.
Comma separated list of hostnames for the ClickHouse servers. Workers are connected to a server in a round-robin fashion.
User to use to connect to the ClickHouse server. Yes, default user is really called default
Password to use to connect to the ClickHouse server. Default password is empty
Add ClickHouse repo
sudo bash -c "echo 'deb http://repo.yandex.ru/clickhouse/deb/stable/ main/' > /etc/apt/sources.list.d/clickhouse.list"Add key and update repolist
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # optional
sudo apt-get updateInstall binaries
sudo apt-get install -y clickhouse-client clickhouse-serverMore details on how to get started with ClickHouse is available here
Ensure ClickHouse is running
sudo service clickhouse-server restartInstall golang
sudo apt install golang-1.9Add go binaries to PATH for convenience and setup GOPATH env
echo 'export PATH="$HOME/gocode/bin:/usr/lib/go-1.9/bin:$PATH"' >> ~/.bashrc
echo 'export GOPATH="$HOME/gocode"' >> ~/.bashrcApply PATH and GOPATH
source ~/.bashrcCreate initial Go folders
mkdir -p $GOPATH/{bin,src}Get and build TSBS
go get github.com/timescale/tsbs
cd $GOPATH/src/github.com/timescale/tsbs/cmd
go get ./...
go install ./...Run test
cd $GOPATH/src/github.com/timescale/tsbs/scriptsGenerate test dataset. This may take some time.
FORMATS=clickhouse ./generate_data.shGenerate test queries set. This should not take much time
FORMATS=clickhouse ./generate_queries.shLoad data set
./load_clickhouse.shRun test query set. In this example, there are restrictions on both number of concurrent workers and number of test queries to run. If you have powerful hardware, feel free to rise limits higher.
NUM_WORKERS=1 MAX_QUERIES=10 ./run_queries_clickhouse.shEnjoy results in /tmp/bulk_queries/result_queries_clickhouse* files.