Skip to content

Commit 280254b

Browse files
docs: use "Data Model" instead of "Data Schema" (#6472)
Co-Authored-By: Igor Lukanin <[email protected]>
1 parent 5a79461 commit 280254b

File tree

73 files changed

+537
-496
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+537
-496
lines changed

docs/.prettierrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"tabWidth": 2,
44
"useTabs": false,
55
"semi": true,
6-
"singleQuote": true,
6+
"singleQuote": false,
77
"arrowParens": "always",
88
"trailingComma": "es5",
99
"bracketSpacing": true,

docs/content/Auth/Security-Context.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ context claims to evaluate access control rules. Inbound JWTs are decoded and
1111
verified using industry-standard [JSON Web Key Sets (JWKS)][link-auth0-jwks].
1212

1313
For access control or authorization, Cube allows you to define granular access
14-
control rules for every cube in your data schema. Cube uses both the request and
14+
control rules for every cube in your data model. Cube uses both the request and
1515
security context claims in the JWT token to generate a SQL query, which includes
1616
row-level constraints from the access control rules.
1717

@@ -132,11 +132,11 @@ LIMIT 10000
132132
In the example below `user_id`, `company_id`, `sub` and `iat` will be injected
133133
into the security context and will be accessible in both the [Security
134134
Context][ref-schema-sec-ctx] and [`COMPILE_CONTEXT`][ref-cubes-compile-ctx]
135-
global variable in the Cube Data Schema.
135+
global variable in the Cube data model.
136136

137137
<InfoBox>
138138

139-
`COMPILE_CONTEXT` is used by Cube at schema compilation time, which allows
139+
`COMPILE_CONTEXT` is used by Cube at data model compilation time, which allows
140140
changing the underlying dataset completely; the Security Context is only used at
141141
query execution time, which simply filters the dataset with a `WHERE` clause.
142142

@@ -151,8 +151,8 @@ query execution time, which simply filters the dataset with a `WHERE` clause.
151151
}
152152
```
153153

154-
With the same JWT payload as before, we can modify schemas before they are
155-
compiled. The following schema will ensure users only see results for their
154+
With the same JWT payload as before, we can modify models before they are
155+
compiled. The following cube will ensure users only see results for their
156156
`company_id` in a multi-tenant deployment:
157157

158158
```javascript

docs/content/Caching/Getting-Started-Pre-Aggregations.mdx

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ layer][ref-caching-preaggs-cubestore].
4040
## Pre-Aggregations without Time Dimension
4141

4242
To illustrate pre-aggregations with an example, let's use a sample e-commerce
43-
database. We have a schema representing all our `Orders`:
43+
database. We have a data model representing all our `Orders`:
4444

4545
```javascript
4646
cube(`Orders`, {
@@ -106,9 +106,9 @@ cube(`Orders`, {
106106

107107
## Pre-Aggregations with Time Dimension
108108

109-
Using the same schema as before, we are now finding that users frequently query
110-
for the number of orders completed per day, and that this query is performing
111-
poorly. This query might look something like:
109+
Using the same data model as before, we are now finding that users frequently
110+
query for the number of orders completed per day, and that this query is
111+
performing poorly. This query might look something like:
112112

113113
```json
114114
{
@@ -118,7 +118,7 @@ poorly. This query might look something like:
118118
```
119119

120120
In order to improve the performance of this query, we can add another
121-
pre-aggregation definition to the `Orders` schema:
121+
pre-aggregation definition to the `Orders` cube:
122122

123123
```javascript
124124
cube(`Orders`, {
@@ -245,7 +245,7 @@ fields and still get a correct result:
245245
| 2021-01-22 00:00:00.000000 | 13 | 150 |
246246

247247
This means that `quantity` and `price` are both **additive measures**, and we
248-
can represent them in the `LineItems` schema as follows:
248+
can represent them in the `LineItems` cube as follows:
249249

250250
```javascript
251251
cube(`LineItems`, {
@@ -340,7 +340,7 @@ $$
340340
We can clearly see that `523` **does not** equal `762.204545454545455`, and we
341341
cannot treat the `profit_margin` column the same as we would any other additive
342342
measure. Armed with the above knowledge, we can add the `profit_margin` field to
343-
our schema **as a [dimension][ref-schema-dims]**:
343+
our cube **as a [dimension][ref-schema-dims]**:
344344

345345
```javascript
346346
cube(`LineItems`, {
@@ -437,17 +437,15 @@ To recap what we've learnt so far:
437437
`count`, `sum`, `min`, `max` or `countDistinctApprox`
438438

439439
Cube looks for matching pre-aggregations in the order they are defined in a
440-
cube's schema file. Each defined pre-aggregation is then tested for a match
440+
cube's data model file. Each defined pre-aggregation is then tested for a match
441441
based on the criteria in the flowchart below:
442442

443-
<div
444-
style="text-align: center"
445-
>
443+
<div style="text-align: center">
446444
<img
447-
alt="Pre-Aggregation Selection Flowchart"
448-
src="https://ucarecdn.com/f986b0cb-a9ea-47b7-a743-ca9a4644c246/"
449-
style="border: none"
450-
width="100%"
445+
alt="Pre-Aggregation Selection Flowchart"
446+
src="https://ucarecdn.com/f986b0cb-a9ea-47b7-a743-ca9a4644c246/"
447+
style="border: none"
448+
width="100%"
451449
/>
452450
</div>
453451

@@ -470,7 +468,7 @@ Some extra considerations for pre-aggregation selection:
470468
`['2020-01-01T00:00:00.000', '2020-01-01T23:59:59.999']`. Date ranges are
471469
inclusive, and the minimum granularity is `second`.
472470

473-
- The order in which pre-aggregations are defined in schemas matter; the first
471+
- The order in which pre-aggregations are defined in models matter; the first
474472
matching pre-aggregation for a query is the one that is used. Both the
475473
measures and dimensions of any cubes specified in the query are checked to
476474
find a matching `rollup`.

docs/content/Caching/Overview.mdx

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@ more about read-only support and pre-aggregation build strategies.
4949

5050
</InfoBox>
5151

52-
Pre-aggregations are defined in the data schema. You can learn more about
53-
defining pre-aggregations in [schema reference][ref-schema-ref-preaggs].
52+
Pre-aggregations are defined in the data model. You can learn more about
53+
defining pre-aggregations in [data modeling reference][ref-schema-ref-preaggs].
5454

5555
```javascript
5656
cube(`Orders`, {
@@ -142,10 +142,9 @@ The default values for `refreshKey` are
142142
- `every: '10 second'` for all other databases.
143143

144144
+You can use a custom SQL query to check if a refresh is required by changing
145-
the [`refreshKey`][ref-schema-ref-cube-refresh-key] property in a cube's Data
146-
Schema. Often, a `MAX(updated_at_timestamp)` for OLTP data is a viable option,
147-
or examining a metadata table for whatever system is managing the data to see
148-
when it last ran.
145+
the [`refreshKey`][ref-schema-ref-cube-refresh-key] property in a cube. Often, a
146+
`MAX(updated_at_timestamp)` for OLTP data is a viable option, or examining a
147+
metadata table for whatever system is managing the data to see when it last ran.
149148

150149
### <--{"id" : "In-memory Cache"}--> Disabling the cache
151150

docs/content/Caching/Using-Pre-Aggregations.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ menuOrder: 3
77

88
Pre-aggregations is a powerful way to speed up your Cube queries. There are many
99
configuration options to consider. Please make sure to also check [the
10-
Pre-Aggregations reference in the data schema section][ref-schema-ref-preaggs].
10+
Pre-Aggregations reference in the data modeling
11+
section][ref-schema-ref-preaggs].
1112

1213
## Refresh Strategy
1314

docs/content/Configuration/Advanced/Multitenancy.mdx

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ subCategory: Advanced
66
menuOrder: 3
77
---
88

9-
Cube supports multitenancy out of the box, both on database and data schema
9+
Cube supports multitenancy out of the box, both on database and data model
1010
levels. Multiple drivers are also supported, meaning that you can have one
1111
customer’s data in MongoDB and others in Postgres with one Cube instance.
1212

@@ -34,7 +34,7 @@ combinations of these configuration options.
3434

3535
### <--{"id" : "Multitenancy"}--> Multitenancy vs Multiple Data Sources
3636

37-
In cases where your Cube schema is spread across multiple different data
37+
In cases where your Cube data model is spread across multiple different data
3838
sources, consider using the [`dataSource` cube property][ref-cube-datasource]
3939
instead of multitenancy. Multitenancy is designed for cases where you need to
4040
serve different datasets for multiple users, or tenants which aren't related to
@@ -169,7 +169,7 @@ cube(`Products`, {
169169
### <--{"id" : "Multitenancy"}--> Running in Production
170170

171171
Each unique id generated by `contextToAppId` or `contextToOrchestratorId` will
172-
generate a dedicated set of resources, including schema compile cache, SQL
172+
generate a dedicated set of resources, including data model compile cache, SQL
173173
compile cache, query queues, in-memory result caching, etc. Depending on your
174174
data model complexity and usage patterns, those resources can have a pretty
175175
sizable memory footprint ranging from single-digit MBs on the lower end and
@@ -219,7 +219,7 @@ module.exports = {
219219
};
220220
```
221221

222-
## Multiple DB Instances with Same Schema
222+
## Multiple DB Instances with Same Data Model
223223

224224
Let's consider an example where we store data for different users in different
225225
databases, but on the same Postgres host. The database name format is
@@ -249,12 +249,12 @@ select the database, based on the `appId` and `userId`:
249249
<WarningBox>
250250

251251
The App ID (the result of [`contextToAppId`][ref-config-ctx-to-appid]) is used
252-
as a caching key for various in-memory structures like schema compilation
252+
as a caching key for various in-memory structures like data model compilation
253253
results, connection pool. The Orchestrator ID (the result of
254254
[`contextToOrchestratorId`][ref-config-ctx-to-orch-id]) is used as a caching key
255255
for database connections, execution queues and pre-aggregation table caches. Not
256-
declaring these properties will result in unexpected caching issues such as
257-
schema or data of one tenant being used for another.
256+
declaring these properties will result in unexpected caching issues such as the
257+
data model or data of one tenant being used for another.
258258

259259
</WarningBox>
260260

@@ -292,7 +292,7 @@ module.exports = {
292292
};
293293
```
294294

295-
## Multiple Schema and Drivers
295+
## Multiple Data Models and Drivers
296296

297297
What if for application with ID 3, the data is stored not in Postgres, but in
298298
MongoDB?
@@ -301,9 +301,9 @@ We can instruct Cube to connect to MongoDB in that case, instead of Postgres. To
301301
do this, we'll use the [`driverFactory`][ref-config-driverfactory] option to
302302
dynamically set database type. We will also need to modify our
303303
[`securityContext`][ref-config-security-ctx] to determine which tenant is
304-
requesting data. Finally, we want to have separate data schemas for every
304+
requesting data. Finally, we want to have separate data models for every
305305
application. We can use the [`repositoryFactory`][ref-config-repofactory] option
306-
to dynamically set a repository with schema files depending on the `appId`:
306+
to dynamically set a repository with data model files depending on the `appId`:
307307

308308
**cube.js:**
309309

docs/content/Configuration/Downstream/Superset.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ a new database:
6969
Your cubes will be exposed as tables, where both your measures and dimensions
7070
are columns.
7171

72-
Let's use the following Cube data schema:
72+
Let's use the following Cube data model:
7373

7474
```javascript
7575
cube(`Orders`, {
@@ -124,7 +124,7 @@ a time grain of `month`.
124124

125125
The `COUNT(*)` aggregate function is being mapped to a measure of type
126126
[count](/schema/reference/types-and-formats#measures-types-count) in Cube's
127-
**Orders** schema file.
127+
**Orders** data model file.
128128

129129
## Additional Configuration
130130

docs/content/Deployment/Cloud/Continuous-Deployment.mdx

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,16 +56,19 @@ Cube Cloud will automatically deploy from the specified production branch
5656

5757
<WarningBox>
5858

59-
Enabling this option will cause the Schema page to display the last known state of a Git-based codebase (if available), instead of reflecting the latest modifications made.
60-
It is important to note that the logic will still be updated in both the API and the Playground.
59+
Enabling this option will cause the <Btn>Data Model</Btn> page to display the
60+
last known state of a Git-based codebase (if available), instead of reflecting
61+
the latest modifications made. It is important to note that the logic will still
62+
be updated in both the API and the Playground.
63+
6164
</WarningBox>
6265

6366
You can use the CLI to set up continuous deployment for a Git repository. You
6467
can also use the CLI to manually deploy changes without continuous deployment.
6568

6669
### <--{"id" : "Deploy with CLI"}--> Manual Deploys
6770

68-
You can deploy your Cube project manually. This method uploads data schema and
71+
You can deploy your Cube project manually. This method uploads data models and
6972
configuration files directly from your local project directory.
7073

7174
You can obtain Cube Cloud deploy token from your deployment **Settings** page.

docs/content/Deployment/Overview.mdx

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ API instances.
4242

4343
API instances and Refresh Workers can be configured via [environment
4444
variables][ref-config-env] or the [`cube.js` configuration file][ref-config-js].
45-
They also need access to the data schema files. Cube Store clusters can be
45+
They also need access to the data model files. Cube Store clusters can be
4646
configured via environment variables.
4747

4848
You can find an example Docker Compose configuration for a Cube deployment in
@@ -57,21 +57,22 @@ requests between multiple API instances.
5757

5858
The [Cube Docker image][dh-cubejs] is used for API Instance.
5959

60-
API instance needs to be configured via environment variables, cube.js file and
61-
has access to the data schema files.
60+
API instances can be configured via environment variables or the `cube.js`
61+
configuration file, and **must** have access to the data model files (as
62+
specified by [`schemaPath`][ref-conf-ref-schemapath].
6263

6364
## Refresh Worker
6465

6566
A Refresh Worker updates pre-aggregations and invalidates the in-memory cache in
66-
the background. They also keep the refresh keys up-to-date for all defined
67-
schemas and pre-aggregations. Please note that the in-memory cache is just
68-
invalidated but not populated by Refresh Worker. In-memory cache is populated
69-
lazily during querying. On the other hand, pre-aggregations are eagerly
70-
populated and kept up-to-date by Refresh Worker.
67+
the background. They also keep the refresh keys up-to-date for all data models
68+
and pre-aggregations. Please note that the in-memory cache is just invalidated
69+
but not populated by Refresh Worker. In-memory cache is populated lazily during
70+
querying. On the other hand, pre-aggregations are eagerly populated and kept
71+
up-to-date by Refresh Worker.
7172

72-
[Cube Docker image][dh-cubejs] can be used for creating Refresh Workers; to make
73-
the service act as a Refresh Worker, `CUBEJS_REFRESH_WORKER=true` should be set
74-
in the environment variables.
73+
The [Cube Docker image][dh-cubejs] can be used for creating Refresh Workers; to
74+
make the service act as a Refresh Worker, `CUBEJS_REFRESH_WORKER=true` should be
75+
set in the environment variables.
7576

7677
## Cube Store
7778

@@ -275,6 +276,7 @@ guide][blog-migration-guide].
275276
[ref-deploy-docker]: /deployment/platforms/docker
276277
[ref-config-env]: /reference/environment-variables
277278
[ref-config-js]: /config
279+
[ref-conf-ref-schemapath]: /config#options-reference-schema-path
278280
[redis]: https://redis.io
279281
[ref-config-redis]: /reference/environment-variables#cubejs-redis-password
280282
[blog-details]: https://cube.dev/blog/how-you-win-by-using-cube-store-part-1

docs/content/Deployment/Production-Checklist.mdx

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -97,37 +97,45 @@ deployment's health and be alerted to any issues.
9797

9898
## Appropriate cluster sizing
9999

100-
There's no one-size-fits-all when it comes to sizing Cube cluster, and its resources.
101-
Resources required by Cube depend a lot on the amount of traffic Cube needs to serve and the amount of data it needs to process.
102-
The following sizing estimates are based on default settings and are very generic, which may not fit your Cube use case, so you should always tweak resources based on consumption patterns you see.
100+
There's no one-size-fits-all when it comes to sizing a Cube cluster and its
101+
resources. Resources required by Cube significantly depend on the amount of
102+
traffic Cube needs to serve and the amount of data it needs to process. The
103+
following sizing estimates are based on default settings and are very generic,
104+
which may not fit your Cube use case, so you should always tweak resources based
105+
on consumption patterns you see.
103106

104107
### <--{"id" : "Appropriate cluster sizing"}--> Memory and CPU
105108

106-
Each Cube cluster should contain at least 2 Cube API instances.
107-
Every Cube API instance should have at least 3GB of RAM and 2 CPU cores allocated for it.
109+
Each Cube cluster should contain at least 2 Cube API instances. Every Cube API
110+
instance should have at least 3GB of RAM and 2 CPU cores allocated for it.
108111

109-
Refresh workers tend to be much more CPU and memory intensive, so at least 6GB of RAM is recommended.
110-
Please note that to take advantage of all available RAM, the Node.js heap size should be adjusted accordingly
111-
by using the [`--max-old-space-size` option][node-heap-size]:
112+
Refresh workers tend to be much more CPU and memory intensive, so at least 6GB
113+
of RAM is recommended. Please note that to take advantage of all available RAM,
114+
the Node.js heap size should be adjusted accordingly by using the
115+
[`--max-old-space-size` option][node-heap-size]:
112116

113117
```sh
114118
NODE_OPTIONS="--max-old-space-size=6144"
115119
```
116120

117-
[node-heap-size]: https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-megabytes
121+
[node-heap-size]:
122+
https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-megabytes
118123

119-
The Cube Store router node should have at least 6GB of RAM and 4 CPU cores allocated for it.
120-
Every Cube Store worker node should have at least 8GB of RAM and 4 CPU cores allocated for it.
121-
The Cube Store cluster should have at least two worker nodes.
124+
The Cube Store router node should have at least 6GB of RAM and 4 CPU cores
125+
allocated for it. Every Cube Store worker node should have at least 8GB of RAM
126+
and 4 CPU cores allocated for it. The Cube Store cluster should have at least
127+
two worker nodes.
122128

123129
### <--{"id" : "Appropriate cluster sizing"}--> RPS and data volume
124130

125-
Depending on schema size, every Core Cube API instance can serve 1 to 10 requests per second.
126-
Every Core Cube Store router node can serve 50-100 queries per second.
127-
As a rule of thumb, you should provision 1 Cube Store worker node per one Cube Store partition or 1M of rows scanned in a query.
128-
For example if your queries scan 16M of rows per query, you should have at least 16 Cube Store worker nodes provisioned.
129-
`EXPLAIN ANALYZE` can be used to see partitions involved in a Cube Store query.
130-
Cube Cloud ballpark performance numbers can differ as it has different Cube runtime.
131+
Depending on data model size, every Core Cube API instance can serve 1 to 10
132+
requests per second. Every Core Cube Store router node can serve 50-100 queries
133+
per second. As a rule of thumb, you should provision 1 Cube Store worker node
134+
per one Cube Store partition or 1M of rows scanned in a query. For example if
135+
your queries scan 16M of rows per query, you should have at least 16 Cube Store
136+
worker nodes provisioned. `EXPLAIN ANALYZE` can be used to see partitions
137+
involved in a Cube Store query. Cube Cloud ballpark performance numbers can
138+
differ as it has different Cube runtime.
131139

132140
[blog-migrate-to-cube-cloud]:
133141
https://cube.dev/blog/migrating-from-self-hosted-to-cube-cloud/

0 commit comments

Comments
 (0)