Skip to content

Commit a5d0bb9

Browse files
committed
AUTO: Sync ScalarDB docs in English to docs site repo
1 parent f276192 commit a5d0bb9

File tree

4 files changed

+123
-601
lines changed

4 files changed

+123
-601
lines changed

docs/scalardb-analytics/README.mdx

Lines changed: 0 additions & 20 deletions
This file was deleted.

docs/scalardb-analytics/deployment.mdx

Lines changed: 53 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,42 @@ tags:
44
displayed_sidebar: docsEnglish
55
---
66

7-
import Tabs from '@theme/Tabs';
8-
import TabItem from '@theme/TabItem';
7+
import Tabs from "@theme/Tabs";
8+
import TabItem from "@theme/TabItem";
99

1010
# Deploy ScalarDB Analytics in Public Cloud Environments
1111

12-
This guide explains how to deploy ScalarDB Analytics in a public cloud environment. ScalarDB Analytics currently uses Apache Spark as an execution engine and supports managed Spark services provided by public cloud providers, such as Amazon EMR and Databricks.
12+
This guide explains how to deploy ScalarDB Analytics in a public cloud environment. ScalarDB Analytics consists of two main components: a ScalarDB Analytics server and Apache Spark. In this guide, you can choose either Amazon EMR or Databricks for the Spark environment.
13+
For details about ScalarDB Analytics, refer to [ScalarDB Analytics Design](./design.mdx).
1314

14-
## Supported managed Spark services and their application types
15+
## Deploy ScalarDB Analytics catalog server
16+
17+
ScalarDB Analytics requires a catalog server to manage metadata and data source connections. The catalog server should be deployed by using Helm charts on a Kubernetes cluster.
18+
19+
For detailed deployment instructions, see [TBD - Helm chart deployment guide].
20+
21+
After deploying the catalog server, note the following information for Spark configuration:
22+
23+
- Catalog server host address
24+
- Catalog port (default: 11051)
25+
- Metering port (default: 11052)
26+
27+
## Deploy Spark with ScalarDB Analytics
28+
29+
After deploying the catalog server, you can configure and deploy Spark with ScalarDB Analytics by using managed Spark services.
30+
31+
### Supported managed Spark services and their application types
1532

1633
ScalarDB Analytics supports the following managed Spark services and application types.
1734

18-
| Public Cloud Service | Spark Driver | Spark Connect | JDBC |
19-
| -------------------------- | ------------ | ------------- | ---- |
20-
| Amazon EMR (EMR on EC2) ||||
21-
| Databricks ||||
35+
| Public Cloud Service | Spark Driver | Spark Connect | JDBC |
36+
| ----------------------- | ------------ | ------------- | ---- |
37+
| Amazon EMR (EMR on EC2) ||||
38+
| Databricks ||||
2239

23-
## Configure and deploy
40+
### Configure and deploy
2441

25-
Select your public cloud environment, and follow the instructions to set up and deploy ScalarDB Analytics.
42+
Select your public cloud environment, and follow the instructions to set up and deploy Spark with ScalarDB Analytics.
2643

2744
<Tabs groupId="cloud-service" queryString>
2845
<TabItem value="emr" label="Amazon EMR">
@@ -41,37 +58,35 @@ To enable ScalarDB Analytics, you need to add the following configuration to the
4158
"Classification": "spark-defaults",
4259
"Properties": {
4360
"spark.jars.packages": "com.scalar-labs:scalardb-analytics-spark-all-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>",
44-
"spark.sql.catalog.<CATALOG_NAME>": "com.scalar.db.analytics.spark.ScalarDbAnalyticsCatalog",
45-
"spark.sql.extensions": "com.scalar.db.analytics.spark.extension.ScalarDbAnalyticsExtensions",
46-
"spark.sql.catalog.<CATALOG_NAME>.license.cert_pem": "<YOUR_LICENSE_CERT_PEM>",
47-
"spark.sql.catalog.<CATALOG_NAME>.license.key": "<YOUR_LICENSE_KEY>",
48-
49-
// Add your data source configuration below
61+
"spark.extraListeners": "com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener",
62+
"spark.sql.catalog.<CATALOG_NAME>": "com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog",
63+
"spark.sql.catalog.<CATALOG_NAME>.server.host": "<CATALOG_SERVER_HOST>",
64+
"spark.sql.catalog.<CATALOG_NAME>.server.catalog.port": "11051",
65+
"spark.sql.catalog.<CATALOG_NAME>.server.metering.port": "11052"
5066
}
5167
}
5268
]
5369
```
5470

5571
The following describes what you should change the content in the angle brackets to:
5672

57-
- `<SPARK_VERSION>`: The version of Spark.
58-
- `<SCALA_VERSION>`: The version of Scala used to build Spark.
59-
- `<SCALARDB_ANALYTICS_VERSION>`: The version of ScalarDB Analytics.
60-
- `<CATALOG_NAME>`: The name of the catalog.
61-
- `<YOUR_LICENSE_CERT_PEM>`: The PEM encoded license certificate.
62-
- `<YOUR_LICENSE_KEY>`: The license key.
73+
- `<SPARK_VERSION>`: The version of Spark (e.g., `3.5` or `3.4`).
74+
- `<SCALA_VERSION>`: The version of Scala used to build Spark (for example, `2.13` or `2.12`).
75+
- `<SCALARDB_ANALYTICS_VERSION>`: The version of ScalarDB Analytics (for example, `3.16.0`).
76+
- `<CATALOG_NAME>`: The name of the catalog. This must match a catalog created on the ScalarDB Analytics server.
77+
- `<CATALOG_SERVER_HOST>`: The host address of your ScalarDB Analytics server.
6378

6479
For more details, refer to [Set up ScalarDB Analytics in the Spark configuration](./run-analytical-queries.mdx#set-up-scalardb-analytics-in-the-spark-configuration).
6580

6681
<h4>Run analytical queries via the Spark driver</h4>
6782

68-
After the EMR Spark cluster has launched, you can use ssh to connect to the primary node of the EMR cluster and run your Spark application. For details on how to create a Spark Driver application, refer to [Spark Driver application](./run-analytical-queries.mdx?spark-application-type=spark-driver-application#develop-a-spark-application).
83+
After the EMR Spark cluster has launched, you can use ssh to connect to the primary node of the EMR cluster and run your Spark application. For details on how to create a Spark driver application, refer to [Spark driver application](./run-analytical-queries.mdx?spark-application-type=spark-driver#develop-a-spark-application).
6984

7085
<h4>Run analytical queries via Spark Connect</h4>
7186

7287
You can use Spark Connect to run your Spark application remotely by using the EMR cluster that you launched.
7388

74-
You first need to configure the Software setting in the same way as the [Spark Driver application](./run-analytical-queries.mdx?spark-application-type=spark-driver-application#develop-a-spark-application). You also need to set the following configuration to enable Spark Connect.
89+
You first need to configure the Software setting in the same way as the [Spark driver application](./run-analytical-queries.mdx?spark-application-type=spark-driver#develop-a-spark-application). You also need to set the following configuration to enable Spark Connect.
7590

7691
<h5>Allow inbound traffic for a Spark Connect server</h5>
7792

@@ -126,46 +141,26 @@ Note that Databricks provides a modified version of Apache Spark, which works di
126141

127142
ScalarDB Analytics works with all-purpose and jobs-compute clusters on Databricks. When you launch the cluster, you need to configure the cluster to enable ScalarDB Analytics as follows:
128143

129-
1. Store the license certificate and license key in the cluster by using the Databricks CLI.
144+
1. Select "No isolation shared" for the cluster mode. (This is required. ScalarDB Analytics works only with this cluster mode.)
145+
2. Select an appropriate Databricks runtime version that supports Spark 3.4 or later.
146+
3. Configure "Advanced Options" > "Spark config" as follows:
130147

131-
```console
132-
databricks secrets create-scope scalardb-analytics-secret # you can use any secret scope name
133-
cat license_key.json | databricks secrets put-secret scalardb-analytics-secret license-key
134-
cat license_cert.pem | databricks secrets put-secret scalardb-analytics-secret license-cert
135148
```
136-
137-
:::note
138-
139-
For details on how to install and use the Databricks CLI, refer to the [Databricks CLI documentation](https://docs.databricks.com/en/dev-tools/cli/index.html).
140-
141-
:::
142-
143-
2. Select "No isolation shared" for the cluster mode. (This is required. ScalarDB Analytics works only with this cluster mode.)
144-
3. Select an appropriate Databricks runtime version that supports Spark 3.4 or later.
145-
4. Configure "Advanced Options" > "Spark config" as follows, replacing `<CATALOG_NAME>` with the name of the catalog that you want to use:
146-
147-
```
148-
spark.sql.catalog.<CATALOG_NAME> com.scalar.db.analytics.spark.ScalarDbAnalyticsCatalog
149-
spark.sql.extensions com.scalar.db.analytics.spark.extension.ScalarDbAnalyticsExtensions
150-
spark.sql.catalog.<CATALOG_NAME>.license.key {{secrets/scalardb-analytics-secret/license-key}}
151-
spark.sql.catalog.<CATALOG_NAME>.license.cert_pem {{secrets/scalardb-analytics-secret/license-pem}}
149+
spark.extraListeners com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener
150+
spark.sql.catalog.<CATALOG_NAME> com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog
151+
spark.sql.catalog.<CATALOG_NAME>.server.host <CATALOG_SERVER_HOST>
152+
spark.sql.catalog.<CATALOG_NAME>.server.catalog.port 11051
153+
spark.sql.catalog.<CATALOG_NAME>.server.metering.port 11052
152154
```
153155

154-
:::note
155-
156-
You also need to configure the data source. For details, refer to [Set up ScalarDB Analytics in the Spark configuration](./run-analytical-queries.mdx#set-up-scalardb-analytics-in-the-spark-configuration).
157-
158-
:::
159-
160-
:::note
156+
Replace the placeholders:
161157

162-
If you specified different secret names in the previous step, be sure to replace the secret names in the configuration above.
158+
- `<CATALOG_NAME>`: The name of the catalog. This must match a catalog created on the ScalarDB Analytics server.
159+
- `<CATALOG_SERVER_HOST>`: The host address of your ScalarDB Analytics catalog server.
163160

164-
:::
165-
166-
5. Add the library of ScalarDB Analytics to the launched cluster as a Maven dependency. For details on how to add the library, refer to the [Databricks cluster libraries documentation](https://docs.databricks.com/en/libraries/cluster-libraries.html).
161+
4. Add the library of ScalarDB Analytics to the launched cluster as a Maven dependency. For details on how to add the library, refer to the [Databricks cluster libraries documentation](https://docs.databricks.com/en/libraries/cluster-libraries.html).
167162

168-
<h4>Run analytical queries via the Spark Driver</h4>
163+
<h4>Run analytical queries via the Spark driver</h4>
169164

170165
You can run your Spark application on the properly configured Databricks cluster with Databricks Notebook or Databricks Jobs to access the tables in ScalarDB Analytics. To run the Spark application, you can migrate your Pyspark, Scala, or Spark SQL application to Databricks Notebook, or use Databricks Jobs to run your Spark application. ScalarDB Analytics works with task types for Notebook, Python, JAR, and SQL.
171166

@@ -185,7 +180,7 @@ Databricks supports JDBC to run SQL jobs on the cluster. You can use this featur
185180

186181
# Target directories
187182
TARGET_DIRECTORIES=("/databricks/jars" "/databricks/hive_metastore_jars")
188-
JAR_PATH="<PATH_TO_YOUR_JAR_FILE_IN_WORKSPACE>
183+
JAR_PATH="<PATH_TO_YOUR_JAR_FILE_IN_WORKSPACE>"
189184

190185
# Copy the JAR file to the target directories
191186
for TARGET_DIR in "${TARGET_DIRECTORIES[@]}"; do

0 commit comments

Comments
 (0)