You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/scalardb-analytics/run-analytical-queries.mdx
+24-24Lines changed: 24 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,18 +19,18 @@ This section describes the prerequisites, setting up ScalarDB Analytics in the S
19
19
20
20
### Prerequisites
21
21
22
-
-**ScalarDB Analytics catalog server**: A running instance that manages catalog metadata and connects to your data sources. The server must be set up with at least one data source registered. For setup and data source registration instructions, see [Set up and administer the ScalarDB Analytics catalog server](./administration.mdx).
23
-
-**Apache Spark**: A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html).
22
+
-**ScalarDB Analytics server:** A running instance that manages catalog information and connects to your data sources. The server must be set up with at least one data source registered. For registering data sources, see [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx).
23
+
-**Apache Spark:** A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html).
24
24
25
25
:::note
26
26
27
-
Apache Spark are built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. You can refer to[Version compatibility](#version-compatibility) for more details.
27
+
Apache Spark is built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. For more details, see[Version compatibility](#version-compatibility).
28
28
29
29
:::
30
30
31
31
### Set up ScalarDB Analytics in the Spark configuration
32
32
33
-
ScalarDB Analytics requires specific Spark configurations to integrate with the catalog server.
33
+
ScalarDB Analytics requires specific Spark configurations to integrate with ScalarDB Analytics server.
34
34
35
35
#### Required Spark configurations
36
36
@@ -44,7 +44,7 @@ When configuring Spark, you must specify a catalog name that matches the catalog
44
44
45
45
#### Example configuration
46
46
47
-
Here's a complete example configuration:
47
+
The following is a complete example configuration:
The following describes what you should change the content in the angle brackets to:
64
64
65
-
-`<SPARK_VERSION>`: Your Spark version (e.g., `3.5` or `3.4`)
66
-
-`<SCALA_VERSION>`: Your Scala version (e.g., `2.13` or `2.12`)
67
-
-`<SCALARDB_ANALYTICS_VERSION>`: The ScalarDB Analytics version (e.g., `3.16.0`)
65
+
-`<SPARK_VERSION>`: Your Spark version (for example, `3.5` or `3.4`)
66
+
-`<SCALA_VERSION>`: Your Scala version (for example, `2.13` or `2.12`)
67
+
-`<SCALARDB_ANALYTICS_VERSION>`: The ScalarDB Analytics version (for example, `3.16.0`)
68
68
69
69
In this example:
70
70
71
-
- The catalog name `myanalytics` must match a catalog that exists on your ScalarDB Analytics server
72
-
- The ScalarDB Analytics server is running at `analytics-server.example.com`
73
-
- Tables will be accessed using the format: `myanalytics.<data_source>.<namespace>.<table>`
71
+
- The catalog name `myanalytics` must match a catalog that exists on your ScalarDB Analytics server.
72
+
- The ScalarDB Analytics server is running at `analytics-server.example.com`.
73
+
- Tables will be accessed using the format: `myanalytics.<data_source>.<namespace>.<table>`.
74
74
75
75
:::important
76
76
77
-
The catalog name in your Spark configuration must match the name of a catalog created on the ScalarDB Analytics server using the CLI. For example, if you created a catalog named `production` on the server, you must use `production` as the catalog name in your Spark configuration properties (e.g., `spark.sql.catalog.production`, `spark.sql.catalog.production.server.host`, etc.).
77
+
The catalog name in your Spark configuration must match the name of a catalog created on the ScalarDB Analytics server by using the CLI. For example, if you created a catalog named `production` on the server, you must use `production` as the catalog name in your Spark configuration properties (for example, `spark.sql.catalog.production`, `spark.sql.catalog.production.server.host`, etc.).
78
78
79
79
:::
80
80
81
81
:::note
82
82
83
-
Data source configurations are managed by the catalog server. For information on configuring data sources in the catalog server, see [Set up and administer the ScalarDB Analytics catalog server](./administration.mdx#configure-data-sources).
83
+
Data source configurations are managed by ScalarDB Analytics server. For information on configuring data sources in ScalarDB Analytics server, see [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx).
84
84
85
85
:::
86
86
87
87
### Build configuration for Spark applications
88
88
89
89
When developing Spark applications that use ScalarDB Analytics, you can add the dependency to your build configuration. For example, with Gradle:
If you bundle your application in a fat JAR using plugins like Gradle Shadow or Maven Shade, exclude ScalarDB Analytics from the fat JAR by using configurations such as `provided` or `shadow`.
99
+
If you bundle your application in a fat JAR by using plugins like Gradle Shadow or Maven Shade, exclude ScalarDB Analytics from the fat JAR by using configurations such as `provided` or `shadow`.
100
100
101
101
:::
102
102
@@ -116,7 +116,7 @@ Depending on your environment, you may not be able to use all the methods mentio
116
116
117
117
:::
118
118
119
-
With all these methods, you can refer to tables in ScalarDB Analytics using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, refer to [Catalog metadata reference](./administration.mdx#catalog-metadata-reference).
119
+
With all these methods, you can refer to tables in ScalarDB Analytics by using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, see [Catalog information reference](./reference-data-source.mdx#catalog-information-reference).
@@ -125,11 +125,11 @@ You can use a commonly used `SparkSession` class for ScalarDB Analytics. Additio
125
125
126
126
To read data from tables in ScalarDB Analytics, you can use the `spark.sql` or `spark.read.table` function in the same way as when reading a normal Spark table.
127
127
128
-
First, you need to set up your Java project. For example, if you are using Gradle, you can add the following to your `build.gradle` file:
128
+
First, you need to set up your Java project. For example, if you are using Gradle, you can add the following to your `build.gradle.kts` file:
@@ -191,7 +191,7 @@ The versions of the packages must match the versions of Spark and ScalarDB Analy
191
191
192
192
:::
193
193
194
-
You also need to include the Spark Connect client package in your application. For example, if you are using Gradle, you can add the following to your `build.gradle` file:
194
+
You also need to include the Spark Connect client package in your application. For example, if you are using Gradle, you can add the following to your `build.gradle.kts` file:
@@ -235,11 +235,11 @@ Unfortunately, Spark Thrift JDBC server does not support the Spark features that
235
235
236
236
ScalarDB Analytics manages its own catalog, containing data sources, namespaces, tables, and columns. That information is automatically mapped to the Spark catalog. In this section, you will learn how ScalarDB Analytics maps its catalog information to the Spark catalog.
237
237
238
-
For details about how information in the raw data sources is mapped to the ScalarDB Analytics catalog, refer to[Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source).
238
+
For details about how information in the raw data sources is mapped to the ScalarDB Analytics catalog, see[Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source).
239
239
240
240
### Catalog structure mapping
241
241
242
-
ScalarDB Analytics maps catalog structure from data sources to Spark catalogs. Tables from data sources in the ScalarDB Analytics catalog are mapped to Spark tables using the following format:
242
+
ScalarDB Analytics maps catalog structure from data sources to Spark catalogs. Tables from data sources in the ScalarDB Analytics catalog are mapped to Spark tables by using the following format:
0 commit comments