Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions DEPRECATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ features:
| Removed | [`initApp` hook](#initapp-hook) | v0.35.0 | v0.35.0 |
| Removed | [`/v1/run-scheduled-refresh` REST API endpoint](#v1run-scheduled-refresh-rest-api-endpoint) | v0.35.0 | v0.36.0 |
| Deprecated | [Node.js 18](#nodejs-18) | v0.36.0 | |
| Deprecated | [`CUBEJS_SCHEDULED_REFRESH_CONCURRENCY`](#cubejs_scheduled_refresh_concurrency) | v1.2.7 | |

### Node.js 8

Expand Down Expand Up @@ -391,3 +392,9 @@ API](https://cube.dev/docs/product/apis-integrations/orchestration-api) and

Node.js 18 reaches [End of Life on April 30, 2025][link-nodejs-eol]. This means
no more updates. Please upgrade to Node.js 20 or higher.

### `CUBEJS_SCHEDULED_REFRESH_CONCURRENCY`

**Deprecated in Release: v1.2.7**

This environment variable was renamed to [`CUBEJS_SCHEDULED_REFRESH_QUERIES_PER_APP_ID`](https://cube.dev/docs/reference/configuration/environment-variables#cubejs_scheduled_refresh_queries_per_app_id). Please use the new name.
1 change: 1 addition & 0 deletions docs/pages/product/caching/_meta.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ module.exports = {
"getting-started-pre-aggregations": "Getting started with pre-aggregations",
"using-pre-aggregations": "Using pre-aggregations",
"matching-pre-aggregations": "Matching pre-aggregations",
"refreshing-pre-aggregations": "Refreshing pre-aggregations",
"lambda-pre-aggregations": "Lambda pre-aggregations",
"running-in-production": "Running in production"
}
21 changes: 21 additions & 0 deletions docs/pages/product/caching/refreshing-pre-aggregations.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Refreshing pre-aggregations

_Pre-aggregation refresh_ is the process of building pre-aggregations and updating
them with new data. Pre-aggregation refresh is the responsibility of the _refresh
worker_.

## Configuration

You can use the following environment variables to configure the refresh worker
behavior:

- `CUBEJS_REFRESH_WORKER` (see also `CUBEJS_PRE_AGGREGATIONS_BUILDER`)
- `CUBEJS_PRE_AGGREGATIONS_SCHEMA`
- `CUBEJS_SCHEDULED_REFRESH_TIMEZONES`
- `CUBEJS_DB_QUERY_TIMEOUT`
- `CUBEJS_REFRESH_WORKER_CONCURRENCY` (see also `CUBEJS_CONCURRENCY`)
- `CUBEJS_SCHEDULED_REFRESH_QUERIES_PER_APP_ID`
- `CUBEJS_DROP_PRE_AGG_WITHOUT_TOUCH`


[ref-multitenancy]: /product/configuration/advanced/multitenancy
4 changes: 3 additions & 1 deletion docs/pages/product/configuration/_meta.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
module.exports = {
"data-sources": "Data sources",
"visualization-tools": "Visualization tools",
"advanced": "Advanced"
"multiple-data-sources": "Multiple data sources",
"concurrency": "Concurrency",
"multitenancy": "Multitenancy"
}
90 changes: 90 additions & 0 deletions docs/pages/product/configuration/concurrency.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Querying concurrency

All queries to [data APIs][ref-data-apis] are processed asynchronously via a _query
queue_. It allows to optimize the load and increase querying performance.

## Query queue

The query queue allows to deduplicate queries to API instances and insulate upstream
data sources from query spikes. It also allows to execute queries to data sources
concurrently for increased performance.

By default, Cube uses a _single_ query queue for queries from all API instances and
the refresh worker to all configured data sources.

<ReferenceBox>

You can read more about the query queue in the [this blog post](https://cube.dev/blog/how-you-win-by-using-cube-store-part-1#query-queue-in-cube).

</ReferenceBox>

### Multiple query queues

You can use the [`context_to_orchestrator_id`][ref-context-to-orchestrator-id]
configuration option to route queries to multiple queues based on the security
context.

<WarningBox>

If you're configuring multiple connections to data sources via the [`driver_factory`
configuration option][ref-driver-factory], you __must__ also configure
`context_to_orchestrator_id` to ensure that queries are routed to correct queues.

</WarningBox>

## Data sources

Cube supports various kinds of [data sources][ref-data-sources], ranging from cloud
data warehouses to embedded databases. Each data source scales differently,
therefore Cube provides sound defaults for each kind of data source out-of-the-box.

### Data source concurrency

By default, Cube uses the following concurrency settings for data sources:

| Data source | Default concurrency |
| --- | --- |
| [Amazon Athena][ref-athena] | 10 |
| [Amazon Redshift][ref-redshift] | 5 |
| [Apache Pinot][ref-pinot] | 10 |
| [ClickHouse][ref-clickhouse] | 10 |
| [Databricks][ref-databricks] | 10 |
| [Firebolt][ref-firebolt] | 10 |
| [Google BigQuery][ref-bigquery] | 10 |
| [Snowflake][ref-snowflake] | 8 |
| All other data sources | 5 or [less, if specified in the driver][link-github-data-source-concurrency] |

You can use the `CUBEJS_CONCURRENCY` environment variable to adjust the maximum
number of concurrent queries to a data source. It's recommended to use the default
configuration unless you're sure that your data source can handle more concurrent
queries.

### Connection pooling

For data sources that support connection pooling, the maximum number of concurrent
connections to the database can also be set by using the `CUBEJS_DB_MAX_POOL`
environment variable. If changing this from the default, you must ensure that the
new value is greater than the number of concurrent connections used by Cube's query
queues and the refresh worker.

## Refresh worker

By default, the refresh worker uses the same concurrency settings as API instances.
However, you can override this behvaior in the refresh worker
[configuration][ref-preagg-refresh].


[ref-data-apis]: /product/apis-integrations
[ref-data-sources]: /product/configuration/data-sources
[ref-context-to-orchestrator-id]: /reference/configuration/config#context_to_orchestrator_id
[ref-driver-factory]: /reference/configuration/config#driver_factory
[ref-preagg-refresh]: /product/caching/refreshing-pre-aggregations#configuration
[ref-athena]: /product/configuration/data-sources/aws-athena
[ref-clickhouse]: /product/configuration/data-sources/clickhouse
[ref-databricks]: /product/configuration/data-sources/databricks-jdbc
[ref-firebolt]: /product/configuration/data-sources/firebolt
[ref-pinot]: /product/configuration/data-sources/pinot
[ref-redshift]: /product/configuration/data-sources/aws-redshift
[ref-snowflake]: /product/configuration/data-sources/snowflake
[ref-bigquery]: /product/configuration/data-sources/google-bigquery
[link-github-data-source-concurrency]: https://github.com/search?q=repo%3Acube-js%2Fcube+getDefaultConcurrency+path%3Apackages%2Fcubejs-&type=code
28 changes: 5 additions & 23 deletions docs/pages/product/configuration/data-sources.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ redirect_from:

Choose a data source to get started with below.

Note that Cube also supports connecting to [multiple data
sources][ref-config-multi-data-src] out of the box.
You can also connect [multiple data sources][ref-config-multi-data-src] at the same
time and adjust the [concurrency settings][ref-data-source-concurrency] for data
sources.

## Data warehouses

Expand Down Expand Up @@ -251,28 +252,9 @@ users on the [Enterprise Premier](https://cube.dev/pricing) product tier.

</InfoBox>

## Concurrency and pooling

<InfoBox>

All Cube database drivers come with presets for concurrency and pooling that
work out-of-the-box. The following information is included as a reference.

</InfoBox>

For increased performance, Cube uses multiple concurrent connections to
configured data sources. The `CUBEJS_CONCURRENCY` environment variable controls
concurrency settings for query queues and the refresh scheduler as well as the
maximum concurrent connections.

For databases that support connection pooling,
the maximum number of concurrent connections to the database can also be set by
using the `CUBEJS_DB_MAX_POOL` environment variable; if changing this from the
default, you must ensure that the new value is greater than the number of
concurrent connections used by Cube's query queues and refresh scheduler.


[ref-config-multi-data-src]: /product/configuration/advanced/multiple-data-sources
[ref-driver-factory]: /reference/configuration/config#driver_factory
[ref-duckdb]: /product/configuration/data-sources/duckdb
[link-github-packages]: https://github.com/cube-js/cube/tree/master/packages
[link-github-packages]: https://github.com/cube-js/cube/tree/master/packages
[ref-data-source-concurrency]: /product/configuration/concurrency#data-sources
15 changes: 6 additions & 9 deletions docs/pages/product/configuration/data-sources/aws-athena.mdx
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
---
redirect_from:
- /config/databases/aws-athena
---

# AWS Athena
# Amazon Athena

## Prerequisites

Expand Down Expand Up @@ -39,7 +34,7 @@ Configuration</Btn> in your deployment.

</InfoBox>

In Cube Cloud, select **AWS Athena** when creating a new deployment and fill in
In Cube Cloud, select AWS Athena** when creating a new deployment and fill in
the required fields:

<Screenshot
Expand All @@ -65,7 +60,9 @@ if [dedicated infrastructure][ref-dedicated-infra] is used. Check out the
| `CUBEJS_AWS_ATHENA_WORKGROUP` | The name of the workgroup in which the query is being started | [A valid Athena Workgroup][aws-athena-workgroup] | ❌ |
| `CUBEJS_AWS_ATHENA_CATALOG` | The name of the catalog to use by default | [A valid Athena Catalog name][awsdatacatalog] | ❌ |
| `CUBEJS_DB_SCHEMA` | The name of the schema to use as `information_schema` filter. Reduces count of tables loaded during schema generation. | A valid schema name | ❌ |
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `5` | A valid number | ❌ |
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number | ❌ |

[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency

## Pre-Aggregation Feature Support

Expand Down Expand Up @@ -151,4 +148,4 @@ connections are made over HTTPS.
[ref-caching-using-preaggs-build-strats]:
/product/caching/using-pre-aggregations#pre-aggregation-build-strategies
[ref-schema-ref-types-formats-countdistinctapprox]: /reference/data-model/types-and-formats#count_distinct_approx
[self-preaggs-batching]: #batching
[self-preaggs-batching]: #batching
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,11 @@ if [dedicated infrastructure][ref-dedicated-infra] is used. Check out the
| `CUBEJS_DB_USER` | The username used to connect to the database | A valid database username ||
| `CUBEJS_DB_PASS` | The password used to connect to the database | A valid database password ||
| `CUBEJS_DB_SSL` | If `true`, enables SSL encryption for database connections from Cube | `true`, `false` ||
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `4` | A valid number ||
| `CUBEJS_DB_MAX_POOL` | The maximum number of concurrent database connections to pool. Default is `16` | A valid number ||
| `CUBEJS_DB_EXPORT_BUCKET_REDSHIFT_ARN` | | ||
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number ||

[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency

## Pre-Aggregation Feature Support

Expand Down
4 changes: 3 additions & 1 deletion docs/pages/product/configuration/data-sources/clickhouse.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,10 @@ CUBEJS_DB_PASS=**********
| `CUBEJS_DB_USER` | The username used to connect to the database | A valid database username | ✅ |
| `CUBEJS_DB_PASS` | The password used to connect to the database | A valid database password | ✅ |
| `CUBEJS_DB_CLICKHOUSE_READONLY` | Whether the ClickHouse user has read-only access or not | `true`, `false` | ❌ |
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `5` | A valid number | ❌ |
| `CUBEJS_DB_MAX_POOL` | The maximum number of concurrent database connections to pool. Default is `20` | A valid number | ❌ |
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number | ❌ |

[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency

## Pre-Aggregation Feature Support

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,10 @@ docker run -it -p 4000:4000 --env-file=.env cube-jdk
| `CUBEJS_DB_DATABRICKS_TOKEN` | The [personal access token][databricks-docs-pat] used to authenticate the Databricks connection | A valid token | ✅ |
| `CUBEJS_DB_DATABRICKS_CATALOG` | The name of the [Databricks catalog][databricks-catalog] to connect to | A valid catalog name | ❌ |
| `CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIR` | The path for the [Databricks DBFS mount][databricks-docs-dbfs] (Not needed if using Unity Catalog connection) | A valid mount path | ❌ |
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `2` | A valid number | ❌ |
| `CUBEJS_DB_MAX_POOL` | The maximum number of concurrent database connections to pool. Default is `8` | A valid number | ❌ |
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number | ❌ |

[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency

## Pre-Aggregation Feature Support

Expand Down
4 changes: 3 additions & 1 deletion docs/pages/product/configuration/data-sources/druid.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,10 @@ CUBEJS_DB_PASS=**********
| `CUBEJS_DB_URL` | The URL for a database | A valid database URL for Druid | ✅ |
| `CUBEJS_DB_USER` | The username used to connect to the database | A valid database username | ✅ |
| `CUBEJS_DB_PASS` | The password used to connect to the database | A valid database password | ✅ |
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `2` | A valid number | ❌ |
| `CUBEJS_DB_MAX_POOL` | The maximum number of concurrent database connections to pool. Default is `8` | A valid number | ❌ |
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number | ❌ |

[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency

## SSL

Expand Down
4 changes: 3 additions & 1 deletion docs/pages/product/configuration/data-sources/duckdb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,13 @@ deployment][ref-demo-deployment] in Cube Cloud.
| `CUBEJS_DB_DUCKDB_S3_SECRET_ACCESS_KEY` | The Secret Access Key to use for database connections | A valid Secret Access Key | ❌ | ✅ |
| `CUBEJS_DB_DUCKDB_S3_ENDPOINT` | The S3 endpoint | A valid [S3 endpoint][duckdb-docs-s3-import] | ❌ | ✅ |
| `CUBEJS_DB_DUCKDB_S3_REGION` | The [region of the bucket][duckdb-docs-s3-import] | A valid AWS region | ❌ | ✅ |
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `2` | A valid number | ❌ | ✅ |
| `CUBEJS_DB_DUCKDB_S3_USE_SSL` | Use SSL for connection | A boolean | ❌ | ❌ |
| `CUBEJS_DB_DUCKDB_S3_URL_STYLE` | To choose the S3 URL style(vhost or path) | 'vhost' or 'path' | ❌ | ❌ |
| `CUBEJS_DB_DUCKDB_S3_SESSION_TOKEN` | The token for the S3 session | A valid Session Token | ❌ | ✅ |
| `CUBEJS_DB_DUCKDB_EXTENSIONS` | A comma-separated list of DuckDB extensions to install and load | A comma-separated list of DuckDB extensions | ❌ | ✅ |
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number | ❌ |

[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency

## Pre-Aggregation Feature Support

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,10 @@ CUBEJS_DB_ELASTIC_APIKEY_KEY=ui2lp2axTNmsyakw9tvNnw
| `CUBEJS_DB_ELASTIC_OPENDISTRO` | If `true`, then use the Open Distro for Elasticsearch | `true`, `false` | ❌ |
| `CUBEJS_DB_ELASTIC_APIKEY_ID` | [ID of the API key from elastic.co][elastic-docs-api-keys] | A valid Elastic.co API key ID | ❌ |
| `CUBEJS_DB_ELASTIC_APIKEY_KEY` | [Value of the API key from elastic.co][elastic-docs-api-keys] | A valid Elastic.co API key value | ❌ |
| `CUBEJS_CONCURRENCY` | The number of concurrent connections each queue has to the database. Default is `2` | A valid number | ❌ |
| `CUBEJS_DB_MAX_POOL` | The maximum number of concurrent database connections to pool. Default is `8` | A valid number | ❌ |
| `CUBEJS_CONCURRENCY` | The number of [concurrent queries][ref-data-source-concurrency] to the data source | A valid number | ❌ |

[ref-data-source-concurrency]: /product/configuration/concurrency#data-source-concurrency

## SSL

Expand Down
Loading