Skip to content
This repository was archived by the owner on Aug 16, 2021. It is now read-only.

Commit 842f762

Browse files
dmiusDmitry
authored andcommitted
Merge branch 'dmius-ebs-vol' of https://github.com/startupturbo/nancy into dmius-ebs-vol
2 parents 5bb2216 + f8ea96e commit 842f762

File tree

20 files changed

+206
-328
lines changed

20 files changed

+206
-328
lines changed

README.md

Lines changed: 79 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,30 +7,78 @@ Nancy helps to conduct automated database experiments.
77
The Nancy Command Line Interface is a unified way to manage automated
88
database experiments either in clouds or on-premise.
99

10-
Experiments are needed every time you:
10+
What is a Database Experiment?
11+
===
12+
Database experiment is a set of actions performed to test
13+
* (a) specified SQL queries ("workload")
14+
* (b) on specified machine / OS / Postgres version ("environment")
15+
* (c) against specified database ("object")
16+
* (d) with an optional change – some DDL or config change ("target" or "delta").
17+
18+
Two main goals for any database experiment:
19+
* (1) validation – check that the specified workload is valid,
20+
* (2) benchmark – perform deep SQL query analysis.
21+
22+
Database experiments are needed when you:
1123
- add or remove indexes;
12-
- want to verify query optimization ideas;
13-
- need to tune database parameters;
14-
- want to perform performance/stress test for your DB;
15-
- are preparing to upgrade your DBMS to the new major version;
24+
- for a new DB schema change, want to validate it and estimate migration time;
25+
- want to verify some query optimization ideas;
26+
- tune database configuration parameters;
27+
- do capacity planning and want to stress-test your DB in some environment;
28+
- plan to upgrade your DBMS to a new major version;
1629
- want to train ML model related to DB optimization.
1730

18-
Currently Nancy works only with PostgreSQL versions 9.6 and 10.
31+
Currently Supported Features
32+
===
33+
* Experiments are conducted in a Docker container with extended Postgres setup
34+
* Supported Postgres versions: 9.6, 10
35+
* Supported locations for experimental runs:
36+
* Any machine with Docker installed
37+
* AWS EC2:
38+
* Run on AWS EC2 Spot Instances (using Docker Machine)
39+
* Allow to specify EC2 instance type
40+
* Auto-detect and use current lowest EC2 Spot Instance prices
41+
* Support local or remote (S3) files – config, dump, etc
42+
* What to test (a.k.a. "target" or "delta"):
43+
* Test Postgres parameters change
44+
* Test DDL change (specified as "do" and "undo" SQL to return state)
45+
* Supported types of workload:
46+
* Use custom SQL as workload
47+
* Use "real workload" prepared using Postgres logs
48+
* For "real workload", allow replaying it with increased speed
49+
* Allow to keep container alive for specified time after all steps are done
50+
* Collected artifacts:
51+
* Workload SQL logs
52+
* Deep SQL query analysis report
1953

2054
Requirements
2155
===
22-
To use Nancy CLI you need Linux or MacOS with installed Docker. If you plan
23-
to run experiments in AWS EC2 instances, you also need Docker Machine
24-
(https://docs.docker.com/machine/).
56+
1) To use Nancy CLI you need Linux or MacOS with installed Docker.
57+
58+
2) To run on AWS EC2 instances, you also need:
59+
* AWS CLI https://aws.amazon.com/en/cli/
60+
* Docker Machine https://docs.docker.com/machine/
61+
* jq https://stedolan.github.io/jq/
62+
2563

2664
Installation
2765
===
66+
67+
In the minimal configuration, only two steps are needed:
68+
69+
1) Install Docker (for Ubuntu/Debian: `sudo apt-get install docker`)
70+
2) Clone this repo and adjust `$PATH`:
2871
```bash
2972
git clone https://github.com/startupturbo/nancy
3073
echo "export PATH=\$PATH:"$(pwd)"/nancy" >> ~/.bashrc
3174
source ~/.bashrc
3275
```
3376

77+
Additionally, to allow use of AWS EC2 instances:
78+
3) Follow instructions https://docs.aws.amazon.com/cli/latest/userguide/installing.html
79+
4) Follow instructions https://docs.docker.com/machine/install-machine/
80+
5) install jq (for Ubuntu/Debian: `sudo apt-get install jq`)
81+
3482
Getting started
3583
===
3684
Start with these commands:
@@ -39,3 +87,25 @@ nancy help
3987
nancy run help
4088
```
4189

90+
"Hello World!"
91+
===
92+
```bash
93+
echo "create table hello_world as select i::int4 from generate_series(1, 1000000) _(i);" > ./sample.dump
94+
bzip2 ./sample.dump
95+
96+
# "Clean run": w/o index
97+
# (seqscan is expected, total time ~150ms, depending on resources)
98+
nancy run \
99+
--run-on localhost \
100+
--workload-custom-sql "select count(1) from hello_world where i between 100000 and 100010;" \
101+
--db-dump-path file://$(pwd)/sample.dump.bz2 --tmp-path /tmp
102+
103+
# Now check how a regular btree index affects performance
104+
# (expected total time: ~0.05ms)
105+
nancy run \
106+
--run-on localhost \
107+
--workload-custom-sql "select count(1) from hello_world where i between 100000 and 100010;" \
108+
--db-dump-path file://$(pwd)/sample.dump.bz2 --tmp-path /tmp \
109+
--target-ddl-do "create index i_hello_world_i on hello_world(i);" \
110+
--target-ddl-undo "drop index i_hello_world_i;"
111+
```

docker/Dockerfile

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
FROM ubuntu:16.04
2+
3+
ARG PG_SERVER_VERSION
4+
5+
ENV PG_SERVER_VERSION=${PG_SERVER_VERSION:-10} \
6+
DEBIAN_FRONTEND=noninteractive
7+
8+
# add custom FTS dictionaries
9+
ADD ./tsearch_data /usr/share/postgresql/$PG_SERVER_VERSION/tsearch_data
10+
# logging ON; memory setting – for 2CPU/4096MB/SSD
11+
ADD ./postgresql_${PG_SERVER_VERSION}_tweak.conf /postgresql.tweak.conf
12+
13+
# install Postgres and postgres-specific software:
14+
# - desired version of Postgres server,
15+
# - psql version 10
16+
# - postgres_dba and pspg
17+
# - pgbadger (modified, not lowercasing DB object names, auto_explain compatibility)
18+
RUN apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys B97B0AFCAA1A47F044F244A07FCC7D46ACCC4CF8 \
19+
&& echo "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main"> /etc/apt/sources.list.d/pgdg.list \
20+
&& apt-get update && apt-get install -y sudo postgresql-$PG_SERVER_VERSION postgresql-contrib-$PG_SERVER_VERSION postgresql-plpython-$PG_SERVER_VERSION \
21+
&& apt-get install -y postgresql-$PG_SERVER_VERSION-plsh postgresql-server-dev-$PG_SERVER_VERSION postgresql-$PG_SERVER_VERSION-rum \
22+
&& apt-get install -y git postgresql-client-10 pspg pgreplay jq etcd libjson-xs-perl \
23+
&& perl -MCPAN -e'install Text::CSV_XS' \
24+
&& git clone https://github.com/NikolayS/postgres_dba.git /root/postgres_dba \
25+
&& git clone https://github.com/NikolayS/pgbadger.git /root/pgbadger
26+
27+
# additionally, install newer NodeJS, npm, Sqitch, and more
28+
RUN wget -q -S -O - https://deb.nodesource.com/setup_8.x | sudo bash \
29+
&& apt-get install -y s3cmd sudo bzip2 python-software-properties software-properties-common \
30+
&& apt-get install -y build-essential cpanminus libdbd-pg-perl nginx netcat npm \
31+
&& npm install -g newman ava \
32+
&& sudo cpanm --quiet --notest App::Sqitch
33+
34+
# configure psql, configure postgres & check postgres start & stop & prepare start script
35+
RUN echo "\\set dba '\\\\\\\\i /root/postgres_dba/start.psql'" >> ~/.psqlrc \
36+
&& echo "\\setenv PAGER 'pspg -bX --no-mouse'" >> ~/.psqlrc \
37+
&& echo "local all all trust" > /etc/postgresql/$PG_SERVER_VERSION/main/pg_hba.conf \
38+
&& echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/$PG_SERVER_VERSION/main/pg_hba.conf \
39+
&& echo "listen_addresses='*'" >> /etc/postgresql/$PG_SERVER_VERSION/main/postgresql.conf \
40+
&& echo "log_filename='postgresql-$PG_SERVER_VERSION-main.log'" >> /etc/postgresql/$PG_SERVER_VERSION/main/postgresql.conf \
41+
&& /etc/init.d/postgresql start && psql -U postgres -c 'create database test;' && /etc/init.d/postgresql stop \
42+
&& cat /postgresql.tweak.conf >> /etc/postgresql/$PG_SERVER_VERSION/main/postgresql.conf \
43+
&& echo "#!/bin/bash" > /pg_start.sh && chmod a+x /pg_start.sh \
44+
&& printf "sudo -u postgres /usr/lib/postgresql/$PG_SERVER_VERSION/bin/postgres -D /var/lib/postgresql/$PG_SERVER_VERSION/main -c config_file=/etc/ postgresql/$PG_SERVER_VERSION/main/postgresql.conf & \n" >> /pg_start.sh \
45+
&& echo "etcd" >> /pg_start.sh
46+
47+
EXPOSE 5432
48+
49+
#VOLUME ["/etc/postgresql", "/var/log/postgresql", "/var/lib/postgresql"]
50+
51+
# etcd is not being actually used (it's for future needs), but it allows restart Postgres with container interruption
52+
CMD ["/pg_start.sh"]
53+

docker/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
How to build/rebuild:
2+
3+
```bash
4+
docker build --build-arg PG_SERVER_VERSION=9.6 -t postgresmen/postgres-with-stuff:pg9.6 .
5+
docker login # you must be registered, go to hub.docker.com
6+
doker push postgresmen/postgres-with-stuff:pg9.6
7+
```

docker/postgresql_10_tweak.conf

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Assume we have machine with 2CPU/4096MB/SSD (CircleCI default)
2+
# IMPORTANT: on faster systems, you need to use your own memory-related settings!
3+
work_mem = 32MB # warning: tune it if you expect *many* concurrent connections
4+
shared_buffers = 3GB
5+
effective_cache_size = 1GB
6+
maintenance_work_mem = 512MB
7+
checkpoint_completion_target = 0.7
8+
wal_buffers = 16MB
9+
random_page_cost = 1.1
10+
effective_io_concurrency = 200
11+
# do not use parallel execution to avoid issues with analysis
12+
max_worker_processes = 0
13+
max_parallel_workers_per_gather = 0
14+
max_parallel_workers = 0
15+
16+
log_destination = 'stderr,csvlog'
17+
logging_collector = on
18+
log_directory = '/var/log/postgresql'
19+
# log_filename – to be set dynamically
20+
log_min_messages = notice
21+
log_min_error_statement = notice
22+
log_min_duration_statement = -1 # rely on "auto_explain.log_min_duration = 0", avoid duplicates
23+
log_checkpoints = on
24+
log_connections = on
25+
log_disconnections = on
26+
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u (%a,%h) '
27+
log_lock_waits = on
28+
log_replication_commands = on
29+
log_temp_files = 0
30+
log_autovacuum_min_duration = 0
31+
32+
shared_preload_libraries = 'pg_stat_statements,auto_explain'
33+
34+
pg_stat_statements.max = 5000
35+
pg_stat_statements.track = all
36+
pg_stat_statements.track_utility = on
37+
pg_stat_statements.save = on
38+
39+
auto_explain.log_min_duration = 0
40+
auto_explain.log_analyze = on
41+
auto_explain.log_verbose = on
42+
auto_explain.log_buffers = on
43+
auto_explain.log_format = 'json'
44+
auto_explain.log_timing = on
45+
auto_explain.log_triggers = on
46+
auto_explain.log_nested_statements = on

docker_deprecated/postgresql.log.conf renamed to docker/postgresql_9.6_tweak.conf

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# Assume we have machine with 2CPU/4096MB/SSD (CircleCI default)
2+
# IMPORTANT: on faster systems, you need to use your own memory-related settings!
3+
work_mem = 32MB # warning: tune it if you expect *many* concurrent connections
4+
shared_buffers = 3GB
5+
effective_cache_size = 1GB
6+
maintenance_work_mem = 512MB
7+
checkpoint_completion_target = 0.7
8+
wal_buffers = 16MB
9+
random_page_cost = 1.1
10+
effective_io_concurrency = 200
11+
# do not use parallel execution to avoid issues with analysis
12+
max_worker_processes = 0
13+
max_parallel_workers_per_gather = 0
14+
115
log_destination = 'stderr,csvlog'
216
logging_collector = on
317
log_directory = '/var/log/postgresql'

docker_deprecated/Dockerfile

Lines changed: 0 additions & 45 deletions
This file was deleted.

docker_deprecated/README.md

Lines changed: 0 additions & 56 deletions
This file was deleted.

docker_deprecated/ec2_postgres_configs/10/i3.16xlarge

Lines changed: 0 additions & 28 deletions
This file was deleted.

0 commit comments

Comments
 (0)