Skip to content

Commit ad076c1

Browse files
committed
V2: Move bootcamp pages to Docs
1 parent f26fd2b commit ad076c1

16 files changed

+3337
-4
lines changed

docs/benchmark/performance-benchmark-tpcds.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "[103-2] TPC-DS: Decision Support Benchmark for Apache Cloudberry"
2+
title: "TPC-DS: Decision Support Benchmark for Apache Cloudberry"
33
description: Run the TPC-DS benchmark automatically on an existing Apache Cloudberry cluster.
44
---
55

docs/benchmark/performance-benchmark-tpch.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "[103-1] TPC-H: Decision Support Benchmark for Apache Cloudberry"
2+
title: "TPC-H: Decision Support Benchmark for Apache Cloudberry"
33
description: Run the TPC-H benchmark automatically on an existing Apache Cloudberry cluster.
44
---
55

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
title: "TPC-DS: Decision Support Benchmark for Apache Cloudberry"
3+
description: Run the TPC-DS benchmark automatically on an existing Apache Cloudberry cluster.
4+
---
5+
6+
This tool is based on the benchmark tool [Pivotal TPC-DS](https://github.com/pivotal/TPC-DS). This repo contains automation of running the DS benchmark on an existing Apache Cloudberry cluster.
7+
8+
:::note
9+
10+
TPC-DS is a decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. The benchmark provides a representative evaluation of performance as a general purpose decision support system. A benchmark result measures query response time in single user mode, query throughput in multi user mode and data maintenance performance for a given hardware, operating system, and data processing system configuration under a controlled, complex, multi-user decision support workload. The purpose of TPC benchmarks is to provide relevant, objective performance data to industry users. You can learn more about TPC-DS from the [TPC website](https://www.tpc.org/tpcds/default5.asp).
11+
12+
:::
13+
14+
## Context
15+
16+
### Supported TPC-DS Versions
17+
18+
TPC has published the following TPC-DS standards over time:
19+
20+
| TPC-DS Benchmark Version | Published Date | Standard Specification |
21+
|-|-|-|
22+
| 3.2.0 (latest) | 06/15, 2021 | http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v3.2.0.pdf |
23+
| 2.1.0 | 11/12, 2015 | http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.1.0.pdf |
24+
| 1.3.1 (earliest) | 02/19, 2015 | http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v1.3.1.pdf |
25+
26+
As of version 1.2 of this tool TPC-DS 3.2.0 is used.
27+
28+
## Setup
29+
30+
### Prerequisites
31+
32+
This is a follow-up tutorial for previous bootcamp steps. Please make sure to have the environment ready for Apache Cloudberry Sandbox up and running.
33+
34+
All the following examples use the standard hostname convention of Cloudberry using `cdw` for coordinator node, and `sdw1..n` for the segment nodes.
35+
36+
### TPC-DS Tools Dependencies
37+
38+
Install the dependencies on `cdw` for compiling the `dsdgen` (data generation) and `dsqgen` (query generation).
39+
40+
```bash
41+
docker exec -it $(docker ps -q) /bin/bash
42+
yum -y install gcc make byacc
43+
```
44+
45+
The source code is from http://tpc.org/tpc_documents_current_versions/current_specifications5.asp.
46+
47+
### Packages
48+
49+
TPC-H and TPC-DS packages are already under "cdw:/tmp/" folder.
50+
51+
```bash
52+
[gpadmin@cdw tmp]$ ls -rlt
53+
-rw-rw-r-- 1 root root 24520013 Jul 27 14:18 TPC-H-CBDB.tar.gz
54+
-rw-rw-r-- 1 root root 7096941 Jul 27 14:18 TPC-DS-CBDB.tar.gz
55+
```
56+
57+
### Execution
58+
59+
To run the benchmark, login as `gpadmin` on `cdw` in the Cloudberry Sandbox, and execute the following command::
60+
61+
```bash
62+
su - gpadmin
63+
tar xzf TPC-DS-CBDB.tar.gz
64+
cd ~/TPC-DS-CBDB
65+
./run.sh
66+
```
67+
68+
The TPC-DS benchmark needs a few minutes to run before you get the final report, which depends on your machine's hardware. You may check the TPC-DS execution log information file under the same directory with a similar name as below.
69+
70+
```
71+
tpcds_20231109_153553.log
72+
```
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
title: "TPC-H: Decision Support Benchmark for Apache Cloudberry"
3+
description: Run the TPC-H benchmark automatically on an existing Apache Cloudberry cluster.
4+
---
5+
6+
This tool is based on the benchmark tool [TPC-H](https://www.tpc.org/tpch/default5.asp).
7+
This repo will guide you on how to run the TPC-H benchmark automatically on an existing Apache Cloudberry cluster in the Apache Cloudberry Sandbox.
8+
9+
:::note
10+
11+
The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. You can learn more from the [TPC-H official website](https://www.tpc.org/tpch/).
12+
13+
:::
14+
15+
## Context
16+
17+
### Supported TPC-H Versions
18+
19+
TPC has published the following TPC-H standards over time:
20+
21+
| TPC-H Benchmark Version | Published Date | Standard Specification |
22+
|-|-|-|
23+
| 3.0.1 | 04/28, 2022| https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf |
24+
| 3.0.0 | 02/18, 2021 | https://tpc.org/TPC_Documents_Current_Versions/pdf/tpc-h_v3.0.0.pdf|
25+
26+
## Setup
27+
28+
### Prerequisites
29+
30+
This is a follow-up tutorial for previous bootcamp steps. Please make sure to have the environment ready for Apache Cloudberry Sandbox up and running.
31+
32+
### TPC-H Tools Dependencies
33+
34+
Make sure that `gcc` and `make` are installed on `cdw` for compiling the `dbgen` (data generation) and `qgen` (query generation).
35+
36+
You can install the dependencies on `cdw`:
37+
38+
```bash
39+
docker exec -it $(docker ps -q) /bin/bash
40+
yum -y install gcc make
41+
```
42+
43+
The source code is from http://tpc.org/tpc_documents_current_versions/current_specifications5.asp.
44+
45+
### Packages
46+
47+
TPC-H and TPC-DS packages are already placed under "cdw:/tmp/" folder.
48+
49+
```bash
50+
[gpadmin@cdw tmp]$ ls -rlt
51+
-rw-rw-r-- 1 root root 24520013 Jul 27 14:18 TPC-H-CBDB.tar.gz
52+
-rw-rw-r-- 1 root root 7096941 Jul 27 14:18 TPC-DS-CBDB.tar.gz
53+
```
54+
55+
### Execution
56+
57+
To run the benchmark, login as `gpadmin` on `cdw` in the Apache Cloudberry Sandbox, and execute the following command:
58+
59+
```bash
60+
su - gpadmin
61+
tar xzf TPC-H-CBDB.tar.gz
62+
cd ~/TPC-H-CBDB
63+
./run.sh
64+
```
65+
66+
The TPC-H benchmark needs a few minutes to run before you get the final report, which depends on your machine's hardware. You may check the TPC-H execution log information file under the same directory with a similar name as below.
67+
68+
```
69+
tpch_20230727_145051.log
70+
```
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
title: Get Started with Sandbox
3+
description: Learn how to quickly set up and connect to a Cloudberry in a Docker environment.
4+
---
5+
6+
This document guides you on how to quickly set up and connect to a Apache Cloudberry in a Docker environment. You can try out Apache Cloudberry by performing some basic operations and running SQL commands.
7+
8+
:::caution
9+
This guide is intended for testing or development. DO NOT use it for production.
10+
:::
11+
12+
## Prerequisites
13+
14+
Make sure that your environment meets the following requirements:
15+
16+
- Platform requirement: Any platform with Docker runtime. For details, refer to [Get Started with Docker](https://www.docker.com/get-started/).
17+
- Other dependencies: Git, SSH, and internet connection
18+
19+
## Build the Sandbox
20+
21+
When building and deploying Cloudberry in Docker, you will have 2 different deployment options as well as different build options.
22+
23+
**Deployment Options**
24+
25+
1. **Single Container** (Default) - With the single container option, you will have the coordinator as well as the Cloudberry segments all running on a single container. This is the default behavior when deploying using the `run.sh` script provided.
26+
2. **Multi-Container** - Deploying with the multi-container option will give you a more realistic deployment of what actual production Cloudberry clusters look like. With multi-node, you will have the coordinator, the standby coordinator, and 2 segment hosts all on their own respective containers. This is to both highlight the distributed nature of Apache Cloudberry as well as highlight how high availability (HA) features work in the event of a server (or in this case a container) failing. This is enabled by passing the `-m` flag to the `run.sh` script which will be highlighted below.
27+
28+
![Apache Cloudberry Sandbox Deployments](/img/bootcamp/sandbox-deployment.jpg)
29+
30+
**Build Options**
31+
32+
1. Compile with the source code of the latest Apache Cloudberry (released in [Apache Cloudberry Release Page](https://github.com/apache/cloudberry/releases)). The base OS will be Rocky Linux 9 Docker image.
33+
2. Method 2 - Compile with the latest Apache Cloudberry [main](https://github.com/apache/cloudberry/tree/main) branch. The base OS will be Rocky Linux 9 Docker image.
34+
35+
Build and deploy steps:
36+
37+
1. Start Docker Desktop and make sure it is running properly on your host platform.
38+
39+
2. Download the repository [apache/cloudberry-bootcamp](https://github.com/apache/cloudberry-bootcamp) to the target machine.
40+
41+
```shell
42+
git clone https://github.com/apache/cloudberry-bootcamp.git
43+
```
44+
45+
3. Enter the repository and run the `run.sh` script to start the Docker container. This will start the automatic installation process. Depending on your environment, you may need to run this with `sudo` command.
46+
47+
- For latest Cloudberry release running on a single container:
48+
49+
```shell
50+
cd cloudberry-bootcamp/000-cbdb-sandbox
51+
./run.sh
52+
```
53+
- For latest Cloudberry release running across multiple containers:
54+
55+
```shell
56+
cd cloudberry-bootcamp/000-cbdb-sandbox
57+
./run.sh -m
58+
```
59+
- For latest main branch running on a single container:
60+
61+
```shell
62+
cd cloudberry-bootcamp/000-cbdb-sandbox
63+
./run.sh -c main
64+
```
65+
66+
- For latest main branch running across multiple containers:
67+
68+
```shell
69+
cd cloudberry-bootcamp/000-cbdb-sandbox
70+
./run.sh -c main -m
71+
```
72+
73+
Once the script finishes without error, the sandbox is built and running successfully. The `docker run` and `docker compose` commands use the `--detach` option allowing you to ssh or access the running Cloudberry instance remotely.
74+
75+
Please review `run.sh` script for additional options (e.g. setting Timezone in running container, only building container). You can also execute `./run.sh -h` to see the usage.
76+
77+
## Connect to the database
78+
79+
:::note
80+
When deploying the multi-container Cloudberry environment it may take extra time for the database to initialize, so you may need to wait a few minutes before you can execute the psql prompt successfully. You can run `docker logs cbdb-cdw -f` to see the current state of the database initialization process, you'll know the process is finished when you see the "Deployment Successful" output.
81+
:::
82+
83+
You can now connect to the database and try some basic operations.
84+
85+
1. Connect to the Docker container from the host machine:
86+
87+
```shell
88+
docker exec -it cbdb-cdw /bin/bash
89+
```
90+
91+
If it is successful, you will see the following prompt:
92+
93+
```shell
94+
[gpadmin@cdw /]$
95+
```
96+
97+
2. Log into Apache Cloudberry in Docker. See the following commands and example outputs:
98+
99+
```shell
100+
[gpadmin@cdw ~]$ psql # Connects to the database with the default database name "gpadmin".
101+
102+
# psql (14.4, server 14.4)
103+
# Type "help" for help.
104+
```
105+
106+
```sql
107+
gpadmin=# SELECT VERSION(); -- Checks the database version.
108+
109+
PostgreSQL 14.4 (Apache Cloudberry 1.0.0 build dev) on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-bit compiled on Oct 24 2023 10:24:28
110+
(1 row)
111+
```
112+
113+
Now you have a Apache Cloudberry and can continue with [Apache Cloudberry Tutorials Based on Docker Installation](../tutorial/)! Enjoy!
114+
115+
## Working with your Apache Cloudberry Docker environment
116+
117+
When working with the Apache Cloudberry Docker environment there are a few commands that will be useful to you.
118+
119+
### Stopping Your Single Container Deployment With Docker
120+
121+
To stop the **single container** deployment while _keeping the data and state_ within the container, you can run the command below. This means that you can later start the container again and any changes you made to the containers will be persisted between runs.
122+
123+
```shell
124+
docker stop cbdb-cdw
125+
```
126+
127+
To stop the **single container** deployment and also remove the volume that belongs to the container, you can run the following command. Keep in mind this will remove the volume as well as the container associated which means any changes you've made inside of the container or any database state will be wiped and unrecoverable.
128+
129+
```shell
130+
docker rm -f cbdb-cdw
131+
```
132+
133+
### Stopping Your Multi-Container Deployment With Docker
134+
135+
To stop the **multi-container** deployment while _keeping the data and state_ within the container, you can run the command below. This means that you can later start the container again and any changes you made to the containers will be persisted between runs.
136+
137+
```shell
138+
docker compose -f docker-compose-rockylinux9.yml stop
139+
```
140+
141+
To stop the **multi-container** deployment and also remove the network and volumes that belong to the containers, you can run the command below. Running this command means it will delete the containers as well as remove the volumes that the containers are associated with. This means any changes you've made inside of the containers or any database state will be wiped and unrecoverable.
142+
143+
```shell
144+
docker compose -f docker-compose-rockylinux9.yml down -v
145+
```
146+
147+
### Starting A Stopped Single Container Cloudberry Docker Deployment
148+
149+
If you've run any of the commands above that keep the Docker volumes persisted between shutting the containers down, you can use the following commands to bring that same deployment back up with it's previous state.
150+
151+
To start a **single container** deployment after it was shut down, you can simply run the following
152+
153+
```shell
154+
docker start cbdb-cdw
155+
```
156+
157+
### Starting A Stopped Multi-Container Cloudberry Docker Deployment
158+
159+
To start a **multi-container** deployment after it was shut down, you can run the following command.
160+
161+
```shell
162+
docker compose -f docker-compose-rockylinux9.yml start
163+
```
164+
165+
:::note
166+
When starting a previously stopped Cloudberry Docker environment, you'll need to manually start the database back up. To do this, just run the following commands once the container(s) are back up and running. The `gpstart` command is used for starting the database, and -a is a flag saying to start the database without prompting (non-interactive).
167+
168+
```shell
169+
docker exec -it cbdb-cdw /bin/bash
170+
171+
[gpadmin@cdw /] gpstart -a
172+
```
173+
:::

0 commit comments

Comments
 (0)