Skip to content

Commit bea9a85

Browse files
authored
docs: New concurrency pages and env vars (#9323)
1 parent 0ad6252 commit bea9a85

35 files changed

+250
-72
lines changed

DEPRECATION.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ features:
6464
| Removed | [`initApp` hook](#initapp-hook) | v0.35.0 | v0.35.0 |
6565
| Removed | [`/v1/run-scheduled-refresh` REST API endpoint](#v1run-scheduled-refresh-rest-api-endpoint) | v0.35.0 | v0.36.0 |
6666
| Deprecated | [Node.js 18](#nodejs-18) | v0.36.0 | |
67+
| Deprecated | [`CUBEJS_SCHEDULED_REFRESH_CONCURRENCY`](#cubejs_scheduled_refresh_concurrency) | v1.2.7 | |
6768

6869
### Node.js 8
6970

@@ -391,3 +392,9 @@ API](https://cube.dev/docs/product/apis-integrations/orchestration-api) and
391392
392393
Node.js 18 reaches [End of Life on April 30, 2025][link-nodejs-eol]. This means
393394
no more updates. Please upgrade to Node.js 20 or higher.
395+
396+
### `CUBEJS_SCHEDULED_REFRESH_CONCURRENCY`
397+
398+
**Deprecated in Release: v1.2.7**
399+
400+
This environment variable was renamed to [`CUBEJS_SCHEDULED_REFRESH_QUERIES_PER_APP_ID`](https://cube.dev/docs/reference/configuration/environment-variables#cubejs_scheduled_refresh_queries_per_app_id). Please use the new name.

docs/pages/product/caching/_meta.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ module.exports = {
22
"getting-started-pre-aggregations": "Getting started with pre-aggregations",
33
"using-pre-aggregations": "Using pre-aggregations",
44
"matching-pre-aggregations": "Matching pre-aggregations",
5+
"refreshing-pre-aggregations": "Refreshing pre-aggregations",
56
"lambda-pre-aggregations": "Lambda pre-aggregations",
67
"running-in-production": "Running in production"
78
}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Refreshing pre-aggregations
2+
3+
_Pre-aggregation refresh_ is the process of building pre-aggregations and updating
4+
them with new data. Pre-aggregation refresh is the responsibility of the _refresh
5+
worker_.
6+
7+
## Configuration
8+
9+
You can use the following environment variables to configure the refresh worker
10+
behavior:
11+
12+
- `CUBEJS_REFRESH_WORKER` (see also `CUBEJS_PRE_AGGREGATIONS_BUILDER`)
13+
- `CUBEJS_PRE_AGGREGATIONS_SCHEMA`
14+
- `CUBEJS_SCHEDULED_REFRESH_TIMEZONES`
15+
- `CUBEJS_DB_QUERY_TIMEOUT`
16+
- `CUBEJS_REFRESH_WORKER_CONCURRENCY` (see also `CUBEJS_CONCURRENCY`)
17+
- `CUBEJS_SCHEDULED_REFRESH_QUERIES_PER_APP_ID`
18+
- `CUBEJS_DROP_PRE_AGG_WITHOUT_TOUCH`
19+
20+
21+
[ref-multitenancy]: /product/configuration/advanced/multitenancy
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
module.exports = {
22
"data-sources": "Data sources",
33
"visualization-tools": "Visualization tools",
4-
"advanced": "Advanced"
4+
"multiple-data-sources": "Multiple data sources",
5+
"concurrency": "Concurrency",
6+
"multitenancy": "Multitenancy"
57
}
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Querying concurrency
2+
3+
All queries to [data APIs][ref-data-apis] are processed asynchronously via a _query
4+
queue_. It allows to optimize the load and increase querying performance.
5+
6+
## Query queue
7+
8+
The query queue allows to deduplicate queries to API instances and insulate upstream
9+
data sources from query spikes. It also allows to execute queries to data sources
10+
concurrently for increased performance.
11+
12+
By default, Cube uses a _single_ query queue for queries from all API instances and
13+
the refresh worker to all configured data sources.
14+
15+
<ReferenceBox>
16+
17+
You can read more about the query queue in the [this blog post](https://cube.dev/blog/how-you-win-by-using-cube-store-part-1#query-queue-in-cube).
18+
19+
</ReferenceBox>
20+
21+
### Multiple query queues
22+
23+
You can use the [`context_to_orchestrator_id`][ref-context-to-orchestrator-id]
24+
configuration option to route queries to multiple queues based on the security
25+
context.
26+
27+
<WarningBox>
28+
29+
If you're configuring multiple connections to data sources via the [`driver_factory`
30+
configuration option][ref-driver-factory], you __must__ also configure
31+
`context_to_orchestrator_id` to ensure that queries are routed to correct queues.
32+
33+
</WarningBox>
34+
35+
## Data sources
36+
37+
Cube supports various kinds of [data sources][ref-data-sources], ranging from cloud
38+
data warehouses to embedded databases. Each data source scales differently,
39+
therefore Cube provides sound defaults for each kind of data source out-of-the-box.
40+
41+
### Data source concurrency
42+
43+
By default, Cube uses the following concurrency settings for data sources:
44+
45+
| Data source | Default concurrency |
46+
| --- | --- |
47+
| [Amazon Athena][ref-athena] | 10 |
48+
| [Amazon Redshift][ref-redshift] | 5 |
49+
| [Apache Pinot][ref-pinot] | 10 |
50+
| [ClickHouse][ref-clickhouse] | 10 |
51+
| [Databricks][ref-databricks] | 10 |
52+
| [Firebolt][ref-firebolt] | 10 |
53+
| [Google BigQuery][ref-bigquery] | 10 |
54+
| [Snowflake][ref-snowflake] | 8 |
55+
| All other data sources | 5 or [less, if specified in the driver][link-github-data-source-concurrency] |
56+
57+
You can use the `CUBEJS_CONCURRENCY` environment variable to adjust the maximum
58+
number of concurrent queries to a data source. It's recommended to use the default
59+
configuration unless you're sure that your data source can handle more concurrent
60+
queries.
61+
62+
### Connection pooling
63+
64+
For data sources that support connection pooling, the maximum number of concurrent
65+
connections to the database can also be set by using the `CUBEJS_DB_MAX_POOL`
66+
environment variable. If changing this from the default, you must ensure that the
67+
new value is greater than the number of concurrent connections used by Cube's query
68+
queues and the refresh worker.
69+
70+
## Refresh worker
71+
72+
By default, the refresh worker uses the same concurrency settings as API instances.
73+
However, you can override this behvaior in the refresh worker
74+
[configuration][ref-preagg-refresh].
75+
76+
77+
[ref-data-apis]: /product/apis-integrations
78+
[ref-data-sources]: /product/configuration/data-sources
79+
[ref-context-to-orchestrator-id]: /reference/configuration/config#context_to_orchestrator_id
80+
[ref-driver-factory]: /reference/configuration/config#driver_factory
81+
[ref-preagg-refresh]: /product/caching/refreshing-pre-aggregations#configuration
82+
[ref-athena]: /product/configuration/data-sources/aws-athena
83+
[ref-clickhouse]: /product/configuration/data-sources/clickhouse
84+
[ref-databricks]: /product/configuration/data-sources/databricks-jdbc
85+
[ref-firebolt]: /product/configuration/data-sources/firebolt
86+
[ref-pinot]: /product/configuration/data-sources/pinot
87+
[ref-redshift]: /product/configuration/data-sources/aws-redshift
88+
[ref-snowflake]: /product/configuration/data-sources/snowflake
89+
[ref-bigquery]: /product/configuration/data-sources/google-bigquery
90+
[link-github-data-source-concurrency]: https://github.com/search?q=repo%3Acube-js%2Fcube+getDefaultConcurrency+path%3Apackages%2Fcubejs-&type=code

docs/pages/product/configuration/data-sources.mdx

Lines changed: 5 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,9 @@ redirect_from:
88

99
Choose a data source to get started with below.
1010

11-
Note that Cube also supports connecting to [multiple data
12-
sources][ref-config-multi-data-src] out of the box.
11+
You can also connect [multiple data sources][ref-config-multi-data-src] at the same
12+
time and adjust the [concurrency settings][ref-data-source-concurrency] for data
13+
sources.
1314

1415
## Data warehouses
1516

@@ -251,28 +252,9 @@ users on the [Enterprise Premier](https://cube.dev/pricing) product tier.
251252

252253
</InfoBox>
253254

254-
## Concurrency and pooling
255-
256-
<InfoBox>
257-
258-
All Cube database drivers come with presets for concurrency and pooling that
259-
work out-of-the-box. The following information is included as a reference.
260-
261-
</InfoBox>
262-
263-
For increased performance, Cube uses multiple concurrent connections to
264-
configured data sources. The `CUBEJS_CONCURRENCY` environment variable controls
265-
concurrency settings for query queues and the refresh scheduler as well as the
266-
maximum concurrent connections.
267-
268-
For databases that support connection pooling,
269-
the maximum number of concurrent connections to the database can also be set by
270-
using the `CUBEJS_DB_MAX_POOL` environment variable; if changing this from the
271-
default, you must ensure that the new value is greater than the number of
272-
concurrent connections used by Cube's query queues and refresh scheduler.
273-
274255

275256
[ref-config-multi-data-src]: /product/configuration/advanced/multiple-data-sources
276257
[ref-driver-factory]: /reference/configuration/config#driver_factory
277258
[ref-duckdb]: /product/configuration/data-sources/duckdb
278-
[link-github-packages]: https://github.com/cube-js/cube/tree/master/packages
259+
[link-github-packages]: https://github.com/cube-js/cube/tree/master/packages
260+
[ref-data-source-concurrency]: /product/configuration/concurrency#data-sources

docs/pages/product/configuration/data-sources/aws-athena.mdx

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,4 @@
1-
---
2-
redirect_from:
3-
- /config/databases/aws-athena
4-
---
5-
6-
# AWS Athena
1+
# Amazon Athena
72

83
## Prerequisites
94

@@ -39,7 +34,7 @@ Configuration</Btn> in your deployment.
3934

4035
</InfoBox>
4136

42-
In Cube Cloud, select **AWS Athena** when creating a new deployment and fill in
37+
In Cube Cloud, select AWS Athena** when creating a new deployment and fill in
4338
the required fields:
4439

4540
<Screenshot
@@ -65,7 +60,9 @@ if [dedicated infrastructure][ref-dedicated-infra] is used. Check out the
6560
| `CUBEJS_AWS_ATHENA_WORKGROUP` | The name of the workgroup in which the query is being started | [A valid Athena Workgroup][aws-athena-workgroup] ||
6661
| `CUBEJS_AWS_ATHENA_CATALOG` | The name of the catalog to use by default | [A valid Athena Catalog name][awsdatacatalog] ||
6762
| `CUBEJS_DB_SCHEMA` | The name of the schema to use as `information_schema` filter. Reduces count of tables loaded during schema generation. | A valid schema name ||
68-
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `5` | A valid number ||
63+
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number ||
64+
65+
[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency
6966

7067
## Pre-Aggregation Feature Support
7168

@@ -151,4 +148,4 @@ connections are made over HTTPS.
151148
[ref-caching-using-preaggs-build-strats]:
152149
/product/caching/using-pre-aggregations#pre-aggregation-build-strategies
153150
[ref-schema-ref-types-formats-countdistinctapprox]: /reference/data-model/types-and-formats#count_distinct_approx
154-
[self-preaggs-batching]: #batching
151+
[self-preaggs-batching]: #batching

docs/pages/product/configuration/data-sources/aws-redshift.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,9 +70,11 @@ if [dedicated infrastructure][ref-dedicated-infra] is used. Check out the
7070
| `CUBEJS_DB_USER` | The username used to connect to the database | A valid database username ||
7171
| `CUBEJS_DB_PASS` | The password used to connect to the database | A valid database password ||
7272
| `CUBEJS_DB_SSL` | If `true`, enables SSL encryption for database connections from Cube | `true`, `false` ||
73-
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `4` | A valid number ||
7473
| `CUBEJS_DB_MAX_POOL` | The maximum number of concurrent database connections to pool. Default is `16` | A valid number ||
7574
| `CUBEJS_DB_EXPORT_BUCKET_REDSHIFT_ARN` | | ||
75+
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number ||
76+
77+
[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency
7678

7779
## Pre-Aggregation Feature Support
7880

docs/pages/product/configuration/data-sources/clickhouse.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,10 @@ CUBEJS_DB_PASS=**********
3939
| `CUBEJS_DB_USER` | The username used to connect to the database | A valid database username ||
4040
| `CUBEJS_DB_PASS` | The password used to connect to the database | A valid database password ||
4141
| `CUBEJS_DB_CLICKHOUSE_READONLY` | Whether the ClickHouse user has read-only access or not | `true`, `false` ||
42-
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `5` | A valid number ||
4342
| `CUBEJS_DB_MAX_POOL` | The maximum number of concurrent database connections to pool. Default is `20` | A valid number ||
43+
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number ||
44+
45+
[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency
4446

4547
## Pre-Aggregation Feature Support
4648

docs/pages/product/configuration/data-sources/databricks-jdbc.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,10 @@ docker run -it -p 4000:4000 --env-file=.env cube-jdk
6060
| `CUBEJS_DB_DATABRICKS_TOKEN` | The [personal access token][databricks-docs-pat] used to authenticate the Databricks connection | A valid token ||
6161
| `CUBEJS_DB_DATABRICKS_CATALOG` | The name of the [Databricks catalog][databricks-catalog] to connect to | A valid catalog name ||
6262
| `CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIR` | The path for the [Databricks DBFS mount][databricks-docs-dbfs] (Not needed if using Unity Catalog connection) | A valid mount path ||
63-
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `2` | A valid number ||
6463
| `CUBEJS_DB_MAX_POOL` | The maximum number of concurrent database connections to pool. Default is `8` | A valid number ||
64+
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number ||
65+
66+
[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency
6567

6668
## Pre-Aggregation Feature Support
6769

0 commit comments

Comments
 (0)