Skip to content

Commit 775e408

Browse files
authored
Refresh cloud documentation (#81)
1 parent 627ee75 commit 775e408

File tree

17 files changed

+596
-1008
lines changed

17 files changed

+596
-1008
lines changed
-144 KB
Binary file not shown.
-142 KB
Binary file not shown.

docs/cluster/automation.md

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
(cluster-automation)=
2+
# Automation
3+
4+
Automation in CrateDB Cloud allows users to streamline and manage routine
5+
database operations efficiently. Two primary automation features available are
6+
the SQL Scheduler and Table Policies, both of which facilitate the maintenance
7+
and optimization of database tasks.
8+
9+
:::{important}
10+
- Automation is available for all newly deployed clusters.
11+
- For existing clusters, the feature can be enabled on demand. (Contact
12+
[support](https://support.crate.io/) for activation.)
13+
14+
Automation utilizes a dedicated database user `gc_admin` with full cluster
15+
privileges to execute scheduled tasks and persists data in the `gc` schema.
16+
:::
17+
18+
## SQL Scheduler
19+
20+
The SQL Scheduler is designed to automate routine database tasks by scheduling
21+
SQL queries to run at specific times, in UTC time. This feature supports
22+
creating job descriptions with valid [cron patterns](https://www.ibm.com/docs/en/db2oc?topic=task-unix-cron-format)
23+
and SQL statements, enabling a wide range of tasks. Users can manage these jobs
24+
through the Cloud UI, adding, removing, editing, activating, and deactivating
25+
them as needed.
26+
27+
### Use Cases
28+
29+
- Regularly updating or aggregating table data.
30+
- Automating export and import of data.
31+
- Deleting old/redundant data to maintain database efficiency.
32+
33+
### Accessing and Using the SQL Scheduler
34+
35+
SQL Scheduler can be found in the "Automation" tab in the left-hand
36+
navigation menu. There are two tabs relevant to the SQL Scheduler:
37+
38+
39+
**SQL Scheduler** shows a list of your existing jobs. In the list, you can
40+
activate/deactivate each job with a toggle in the "Active" column. You can
41+
also edit and delete jobs with buttons on the right side of the list.
42+
43+
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-overview.png)
44+
45+
46+
**Logs** shows a list of *scheduled* job runs, whether they failed or succeeded,
47+
execution time, run time, and the error in case they were unsuccessful. In case
48+
of an error, more details can be viewed showing the executed query and a stack
49+
trace. You can filter the logs by status or by a specific job.
50+
51+
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-logs.png)
52+
53+
### Examples
54+
55+
#### Cleanup of Old Files
56+
57+
Cleanup tasks represent a common use case for these types of automated jobs.
58+
This example deletes records older than 30 days from a specified table once a
59+
day:
60+
61+
```sql
62+
DELETE FROM "sample_data"
63+
WHERE
64+
"timestamp_column" < NOW() - INTERVAL '30 days';
65+
```
66+
67+
How often you run it, of course, depends on you, but once a day is common for
68+
cleanup. This expression runs every day at 2:30 PM UTC:
69+
70+
Schedule: `30 14 * * *`
71+
72+
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-example-cleanup.png)
73+
74+
#### Copying Logs into a Persistent Table
75+
76+
Another useful example might be copying data to another table for archival
77+
purposes. This specifically copies from the system logs table into one of
78+
our own tables.
79+
80+
```sql
81+
CREATE TABLE IF NOT EXISTS "logs"."persistent_jobs_log" (
82+
"classification" OBJECT (DYNAMIC),
83+
"ended" TIMESTAMP WITH TIME ZONE,
84+
"error" TEXT,
85+
"id" TEXT,
86+
"node" OBJECT (DYNAMIC),
87+
"started" TIMESTAMP WITH TIME ZONE,
88+
"stmt" TEXT,
89+
"username" TEXT,
90+
PRIMARY KEY (id)
91+
) CLUSTERED INTO 1 SHARDS;
92+
93+
INSERT INTO
94+
"logs"."persistent_jobs_log"
95+
SELECT
96+
*
97+
FROM
98+
sys.jobs_log
99+
ON CONFLICT ("id") DO NOTHING;
100+
```
101+
102+
In this example, we schedule the job to run every hour:
103+
104+
Schedule: `0 * * * *`
105+
106+
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-example-copying.png)
107+
108+
:::{note}
109+
Limitations and Known Issues:
110+
* Only one job can run at a time; subsequent jobs will be queued until the
111+
current one completes.
112+
* Long-running jobs may block the execution of queued jobs, leading to
113+
potential delays.
114+
:::
115+
116+
117+
## Table Policies
118+
119+
Table policies allow automating maintenance operations for **partitioned tables**.
120+
Automated actions can be set up to be executed daily based on a pre-configured
121+
ruleset.
122+
123+
![Table policy list](../_assets/img/cluster-table-policy.png)
124+
125+
### Overview
126+
127+
Table policy overview can be found in the left-hand navigation menu under
128+
"Automation". From the list of policies, you can create, delete, edit, or
129+
(de)activate them. Logs of executed policies can be found in the "Logs" tab.
130+
131+
![Table policy list](../_assets/img/cluster-table-policy-logs.png)
132+
133+
A new policy can be created with the "Add New Policy" button.
134+
135+
![Table policy list](../_assets/img/cluster-table-policy-create.png)
136+
137+
After naming the policy and selecting the tables/schemas to be impacted, you
138+
must specify the time column. This column, which should be a timestamp used for
139+
partitioning, will determine the data affected by the policy. It is important
140+
that this time column is consistently present across all targeted tables/schemas.
141+
While you can apply the policy to tables without the specified time column,
142+
it will not get executed for those. If your tables have different timestamp
143+
columns, consider setting up separate policies for each to ensure accuracy.
144+
145+
:::{note}
146+
The "Time Column" must be of type `TIMESTAMP`.
147+
:::
148+
149+
Next, a condition is used to determine affected partitions. The system is
150+
time-based. A partition is eligible for action if the value in the partitioned
151+
column is smaller (`<`), or smaller or equal (`<=`) than the current date minus
152+
`n` days, months, or years.
153+
154+
### Actions
155+
156+
Following actions are supported:
157+
* **Delete:** Deletes eligible partitions along with their data.
158+
* **Set replicas:** Changes the replication factor of eligible partitions.
159+
* **Force merge:** Merges segments on eligible partitions to ensure a specified number.
160+
161+
After filling out the info, you can see the affected schemas/tables and the
162+
number of affected partitions if the policy gets executed at this very moment.
163+
164+
### Examples
165+
166+
Consider a scenario where you have a table and want to optimize space on your
167+
cluster. For older data (e.g., 30 days), which may have already been snapshotted
168+
and is only accessed infrequently, meaning it's not used for live analyitcs, it
169+
might be sufficient for it to exist just once in the cluster without replication.
170+
Additionally, you may not want to retain data older than 60 days.
171+
172+
Assume the following table schema:
173+
174+
```sql
175+
CREATE TABLE data_table (
176+
ts TIMESTAMP,
177+
ts_day GENERATED ALWAYS AS date_trunc('day',ts),
178+
val DOUBLE
179+
) PARTITIONED BY (ts_day);
180+
```
181+
182+
For the outlined scenario, the policies would be as follows:
183+
184+
**Policy 1 - Saving replica space:**
185+
* **Time Column:** `ts_day`
186+
* **Condition:** `older than 30 days`
187+
* **Actions:** `Set replicas to 0.`
188+
189+
**Policy 2 - Data removal:**
190+
* **Time Column:** `ts_day`
191+
* **Condition:** `older than 60 days`
192+
* **Actions:** `Delete eligible partition(s)`

docs/cluster/backups.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
(cluster-backups)=
2+
# Backups
3+
4+
You can find the Backups page in the detailed view of your cluster and
5+
you can see and restore all existing backups here.
6+
7+
By default, a backup is made every hour. The backups are kept for 14
8+
days. We also keep the last 14 backups indefinitely, no matter the state
9+
of your cluster.
10+
11+
The Backups tab provides a list of all your backups. By default, a
12+
backup is made every hour.
13+
14+
![Cloud Console cluster backups page](../_assets/img/cluster-backups.png)
15+
16+
You can also control the schedule of your backups by clicking the *Edit
17+
backup schedule* button.
18+
19+
![Cloud Console cluster backups edit page](../_assets/img/cluster-backups-edit.png)
20+
21+
Here you can create a custom schedule by selecting any number of hour
22+
slots. Backups will be created at selected times. At least one backup a
23+
day is mandatory.
24+
25+
To restore a particular backup, click the *Restore* button. A popup
26+
window with a SQL statement will appear. Input this statement to your
27+
Admin UI console either by copy-pasting it, or clicking the *Run query
28+
in Admin UI*. The latter will bring you directly to the Admin UI console
29+
with the statement automatically pre-filled.
30+
31+
![Cloud Console cluster backups restore page](../_assets/img/cluster-backups-restore.png)
32+
33+
You have a choice between restoring the cluster fully, or only specific
34+
tables.
35+
36+
(cluster-cloning)=
37+
## Cluster Cloning
38+
39+
Cluster cloning is a process of duplicating all the data from a specific
40+
snapshot into a different cluster. Creating the new cluster isn't part
41+
of the cloning process, you need to create the target cluster yourself.
42+
You can clone a cluster from the Backups page.
43+
44+
![Cloud Console cluster backup snapshots](../_assets/img/cluster-backups.png)
45+
46+
Choose a snapshot and click the *Clone* button. As with restoring a
47+
backup, you can choose between cloning the whole cluster, or only
48+
specific tables.
49+
50+
![Cloud Console cluster clone popup](../_assets/img/cluster-clone-popup.png)
51+
52+
:::{note}
53+
Keep in mind that the full cluster clone will include users, views,
54+
privileges and everything else. Cloning also doesn't distinguish
55+
between cluster plans, meaning you can clone from CR2 to CR1 or any
56+
other variation.
57+
:::
58+
59+
(cluster-cloning-fail)=
60+
## Failed cloning
61+
62+
There are circumstances under which cloning can fail or behave
63+
unexpectedly. These are:
64+
65+
- If you already have tables with the same names in the target cluster
66+
as in the source snapshot, the entire clone operation will fail.
67+
- There isn't enough storage left on the target cluster to
68+
accommodate the tables you're trying to clone. In this case, you
69+
might get an incomplete cloning as the cluster will run out of
70+
storage.
71+
- You're trying to clone an invalid or no longer existing snapshot.
72+
This can happen if you're cloning through
73+
[Croud](https://cratedb.com/docs/cloud/cli/en/latest/). In this case,
74+
the cloning will fail.
75+
- You're trying to restore a table that is not included in the
76+
snapshot. This can happen if you're restoring snapshots through
77+
[Croud](https://cratedb.com/docs/cloud/cli/en/latest/). In this case,
78+
the cloning will fail.
79+
80+
When cloning fails, it is indicated by a banner in the cluster overview
81+
screen.
82+
83+
![Cloud Console cluster failed cloning](../_assets/img/cluster-clone-failed.png)

docs/cluster/console.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
(cluster-console)=
2+
# Console
3+
4+
The Console in CrateDB Cloud allows users to execute SQL queries seamlessly
5+
against their CrateDB cluster. The Console can be accessed by users having the
6+
"Organization Admin" role in the left-hand navigation menu within a cluster.
7+
8+
- **Table and Schema Tree View:** Easily navigate through your database
9+
structure.
10+
- **Client-Side Query Validation:** Ensure your SQL queries are correct before
11+
execution.
12+
- **Multiple Query Execution:** Run several queries in sequence.
13+
- **Query History:** Access and manage your past queries.
14+
15+
:::{important}
16+
- The Console is available for all newly deployed clusters.
17+
- For older clusters, this feature can be enabled on demand. Contact
18+
[support](https://support.crate.io/) for activation.
19+
20+
The Console currently utilizes a dedicated database user `gc_admin` with full
21+
cluster privileges.
22+
:::
23+
24+
:::{note}
25+
**Multi-Query Execution:**
26+
When running multiple queries at once, the Console executes them sequentially,
27+
not within a single session or transaction. If one query fails, the subsequent
28+
queries will not be executed. Currently, session settings are not persisted
29+
between queries.
30+
:::

docs/cluster/export.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
(cluster-export)=
2+
# Export
3+
4+
The "Export" section allows users to download specific tables/views. When you
5+
first visit the Export tab, you can specify the name of a table/view,
6+
format (CSV, JSON, or Parquet) and whether you'd like your data to be
7+
gzip compressed (recommended for CSV and JSON files).
8+
9+
:::{important}
10+
- Size limit for exporting is 1 GiB
11+
- Exports are held for 3 days, then automatically deleted
12+
:::
13+
14+
:::{note}
15+
**Limitations with Parquet**:
16+
Parquet is a highly compressed data format for very efficient storage of
17+
tabular data. Please note that for OBJECT and ARRAY columns in CrateDB,
18+
the exported data will be JSON encoded when saving to Parquet
19+
(effectively saving them as strings). This is due to the complexity of
20+
encoding structs and lists in the Parquet format, where determining the
21+
exact schema might not be possible. When re-importing such a Parquet
22+
file, make sure you pre-create the table with the correct schema.
23+
:::
24+
25+
26+
27+

0 commit comments

Comments
 (0)