Skip to content

Commit 5298d1d

Browse files
committed
Merge branch 'main' into feature/add-support-dropdown
2 parents e391b91 + 0fd2035 commit 5298d1d

File tree

50 files changed

+2535
-330
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+2535
-330
lines changed

docs/api-guide.mdx

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1150,6 +1150,84 @@ Scan scanUsingSpecifiedNamespace =
11501150
.build();
11511151
```
11521152

1153+
##### Operation attributes
1154+
1155+
The operation attribute is a key-value pair that can be used to store additional information about an operation. You can set operation attributes by using the `attribute()` or `attributes()` method in the operation builder, as shown below:
1156+
1157+
```java
1158+
// Set operation attributes in the `Get` operation.
1159+
Get get = Get.newBuilder()
1160+
.namespace("ns")
1161+
.table("tbl")
1162+
.partitionKey(partitionKey)
1163+
.clusteringKey(clusteringKey)
1164+
.attribute("attribute1", "value1")
1165+
.attributes(ImmutableMap.of("attribute2", "value2", "attribute3", "value3"))
1166+
.build();
1167+
1168+
// Set operation attributes in the `Scan` operation.
1169+
Scan scan = Scan.newBuilder()
1170+
.namespace("ns")
1171+
.table("tbl")
1172+
.partitionKey(partitionKey)
1173+
.projections("c1", "c2", "c3", "c4")
1174+
.attribute("attribute1", "value1")
1175+
.attributes(ImmutableMap.of("attribute2", "value2", "attribute3", "value3"))
1176+
.build();
1177+
1178+
// Set operation attributes in the `Insert` operation.
1179+
Insert insert = Insert.newBuilder()
1180+
.namespace("ns")
1181+
.table("tbl")
1182+
.partitionKey(partitionKey)
1183+
.clusteringKey(clusteringKey)
1184+
.floatValue("c4", 1.23F)
1185+
.doubleValue("c5", 4.56)
1186+
.attribute("attribute1", "value1")
1187+
.attributes(ImmutableMap.of("attribute2", "value2", "attribute3", "value3"))
1188+
.build();
1189+
1190+
// Set operation attributes in the `Upsert` operation.
1191+
Upsert upsert = Upsert.newBuilder()
1192+
.namespace("ns")
1193+
.table("tbl")
1194+
.partitionKey(partitionKey)
1195+
.clusteringKey(clusteringKey)
1196+
.floatValue("c4", 1.23F)
1197+
.doubleValue("c5", 4.56)
1198+
.attribute("attribute1", "value1")
1199+
.attributes(ImmutableMap.of("attribute2", "value2", "attribute3", "value3"))
1200+
.build();
1201+
1202+
// Set operation attributes in the `Update` operation.
1203+
Update update = Update.newBuilder()
1204+
.namespace("ns")
1205+
.table("tbl")
1206+
.partitionKey(partitionKey)
1207+
.clusteringKey(clusteringKey)
1208+
.floatValue("c4", 1.23F)
1209+
.doubleValue("c5", 4.56)
1210+
.attribute("attribute1", "value1")
1211+
.attributes(ImmutableMap.of("attribute2", "value2", "attribute3", "value3"))
1212+
.build();
1213+
1214+
// Set operation attributes in the `Delete` operation.
1215+
Delete delete = Delete.newBuilder()
1216+
.namespace("ns")
1217+
.table("tbl")
1218+
.partitionKey(partitionKey)
1219+
.clusteringKey(clusteringKey)
1220+
.attribute("attribute1", "value1")
1221+
.attributes(ImmutableMap.of("attribute2", "value2", "attribute3", "value3"))
1222+
.build();
1223+
```
1224+
1225+
:::note
1226+
1227+
ScalarDB currently has no available operation attributes.
1228+
1229+
:::
1230+
11531231
#### Commit a transaction
11541232

11551233
After executing CRUD operations, you need to commit a transaction to finish it.

docs/scalardb-analytics-spark/version-compatibility.mdx

Lines changed: 0 additions & 18 deletions
This file was deleted.

docs/scalardb-analytics-spark/README.mdx renamed to docs/scalardb-analytics/README.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@ The current version of ScalarDB Analytics leverages **Apache Spark** as its exec
1717
## Further reading
1818

1919
* For tutorials on how to use ScalarDB Analytics by using a sample dataset and application, see [Getting Started with ScalarDB Analytics](../scalardb-samples/scalardb-analytics-spark-sample/README.mdx).
20-
* For supported Spark and Scala versions, see [Version Compatibility of ScalarDB Analytics with Spark](version-compatibility.mdx)
20+
* For supported Spark and Scala versions, see [Version Compatibility of ScalarDB Analytics with Spark](./run-analytical-queries.mdx#version-compatibility)
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
---
2+
tags:
3+
- Enterprise Option
4+
displayed_sidebar: docsEnglish
5+
---
6+
7+
import Tabs from '@theme/Tabs';
8+
import TabItem from '@theme/TabItem';
9+
10+
# Deploy ScalarDB Analytics in Public Cloud Environments
11+
12+
This guide explains how to deploy ScalarDB Analytics in a public cloud environment. ScalarDB Analytics currently uses Apache Spark as an execution engine and supports managed Spark services provided by public cloud providers, such as Amazon EMR and Databricks.
13+
14+
## Supported managed Spark services and their application types
15+
16+
ScalarDB Analytics supports the following managed Spark services and application types.
17+
18+
| Public Cloud Service | Spark Driver | Spark Connect | JDBC |
19+
| -------------------------- | ------------ | ------------- | ---- |
20+
| Amazon EMR (EMR on EC2) ||||
21+
| Databricks ||||
22+
23+
## Configure and deploy
24+
25+
Select your public cloud environment, and follow the instructions to set up and deploy ScalarDB Analytics.
26+
27+
<Tabs groupId="cloud-service" queryString>
28+
<TabItem value="emr" label="Amazon EMR">
29+
30+
<h3>Use Amazon EMR</h3>
31+
32+
You can use Amazon EMR (EMR on EC2) to run analytical queries through ScalarDB Analytics. For the basics to launch an EMR cluster, please refer to the [AWS EMR on EC2 documentation](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan.html).
33+
34+
<h4>ScalarDB Analytics configuration</h4>
35+
36+
To enable ScalarDB Analytics, you need to add the following configuration to the Software setting when you launch an EMR cluster. Be sure to replace the content in the angle brackets:
37+
38+
```json
39+
[
40+
{
41+
"Classification": "spark-defaults",
42+
"Properties": {
43+
"spark.jars.packages": "com.scalar-labs:scalardb-analytics-spark-all-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>",
44+
"spark.sql.catalog.<CATALOG_NAME>": "com.scalar.db.analytics.spark.ScalarDbAnalyticsCatalog",
45+
"spark.sql.extensions": "com.scalar.db.analytics.spark.extension.ScalarDbAnalyticsExtensions",
46+
"spark.sql.catalog.<CATALOG_NAME>.license.cert_pem": "<YOUR_LICENSE_CERT_PEM>",
47+
"spark.sql.catalog.<CATALOG_NAME>.license.key": "<YOUR_LICENSE_KEY>",
48+
49+
// Add your data source configuration below
50+
}
51+
}
52+
]
53+
```
54+
55+
The following describes what you should change the content in the angle brackets to:
56+
57+
- `<SPARK_VERSION>`: The version of Spark.
58+
- `<SCALA_VERSION>`: The version of Scala used to build Spark.
59+
- `<SCALARDB_ANALYTICS_VERSION>`: The version of ScalarDB Analytics.
60+
- `<CATALOG_NAME>`: The name of the catalog.
61+
- `<YOUR_LICENSE_CERT_PEM>`: The PEM encoded license certificate.
62+
- `<YOUR_LICENSE_KEY>`: The license key.
63+
64+
For more details, refer to [Set up ScalarDB Analytics in the Spark configuration](development.mdx#set-up-scalardb-analytics-in-the-spark-configuration).
65+
66+
<h4>Run analytical queries via the Spark driver</h4>
67+
68+
After the EMR Spark cluster has launched, you can use ssh to connect to the primary node of the EMR cluster and run your Spark application. For details on how to create a Spark Driver application, refer to [Spark Driver application](development.mdx?spark-application-type=spark-driver#spark-driver-application).
69+
70+
<h4>Run analytical queries via Spark Connect</h4>
71+
72+
You can use Spark Connect to run your Spark application remotely by using the EMR cluster that you launched.
73+
74+
You first need to configure the Software setting in the same way as the [Spark Driver application](development.mdx?spark-application-type=spark-driver#spark-driver-application). You also need to set the following configuration to enable Spark Connect.
75+
76+
<h5>Allow inbound traffic for a Spark Connect server</h5>
77+
78+
1. Create a security group to allow inbound traffic for a Spark Connect server. (Port 15001 is the default).
79+
2. Allow the role of "Amazon EMR service role" to attach the security group to the primary node of the EMR cluster.
80+
3. Add the security group to the primary node of the EMR cluster as "Additional security groups" when you launch the EMR cluster.
81+
82+
<h5>Launch the Spark Connect server via a bootstrap action</h5>
83+
84+
1. Create a script file to launch the Spark Connect server as follows:
85+
86+
```bash
87+
#!/usr/bin/env bash
88+
89+
set -eu -o pipefail
90+
91+
cd /var/lib/spark
92+
93+
sudo -u spark /usr/lib/spark/sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_<SCALA_VERSION>:<SPARK_FULL_VERSION>,com.scalar-labs:scalardb-analytics-spark-all-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>
94+
```
95+
96+
The following describes what you should change the content in the angle brackets to:
97+
98+
- `<SCALA_VERSION>`: The major and minor version of Scala that matches your Spark installation (such as 2.12 or 2.13)
99+
- `<SPARK_FULL_VERSION>`: The full version of Spark you are using (such as 3.5.3)
100+
- `<SPARK_VERSION>`: The major and minor version of Spark you are using (such as 3.5)
101+
- `<SCALARDB_ANALYTICS_VERSION>`: The version of ScalarDB Analytics
102+
103+
2. Upload the script file to S3.
104+
3. Allow the role of "EC2 instance profile for Amazon EMR" to access the uploaded script file in S3.
105+
4. Add the uploaded script file to "Bootstrap actions" when you launch the EMR cluster.
106+
107+
<h5>Run analytical queries</h5>
108+
109+
You can run your Spark application via Spark Connect from anywhere by using the remote URL of the Spark Connect server, which is `sc://<PRIMARY_NODE_PUBLIC_HOSTNAME>:15001`.
110+
111+
For details on how to create a Spark application by using Spark Connect, refer to [Spark Connect application](development.mdx?spark-application-type=spark-connect#spark-connect-application).
112+
113+
</TabItem>
114+
<TabItem value="databricks" label="Databricks">
115+
<h3>Use Databricks</h3>
116+
117+
You can use Databricks to run analytical queries through ScalarDB Analytics.
118+
119+
:::note
120+
121+
Note that Databricks provides a modified version of Apache Spark, which works differently from the original Apache Spark.
122+
123+
:::
124+
125+
<h4>Launch Databricks cluster</h4>
126+
127+
ScalarDB Analytics works with all-purpose and jobs-compute clusters on Databricks. When you launch the cluster, you need to configure the cluster to enable ScalarDB Analytics as follows:
128+
129+
1. Store the license certificate and license key in the cluster by using the Databricks CLI.
130+
131+
```console
132+
databricks secrets create-scope scalardb-analytics-secret # you can use any secret scope name
133+
cat license_key.json | databricks secrets put-secret scalardb-analytics-secret license-key
134+
cat license_cert.pem | databricks secrets put-secret scalardb-analytics-secret license-cert
135+
```
136+
137+
:::note
138+
139+
For details on how to install and use the Databricks CLI, refer to the [Databricks CLI documentation](https://docs.databricks.com/en/dev-tools/cli/index.html).
140+
141+
:::
142+
143+
2. Select "No isolation shared" for the cluster mode. (This is required. ScalarDB Analytics works only with this cluster mode.)
144+
3. Select an appropriate Databricks runtime version that supports Spark 3.4 or later.
145+
4. Configure "Advanced Options" > "Spark config" as follows, replacing `<CATALOG_NAME>` with the name of the catalog that you want to use:
146+
147+
```
148+
spark.sql.catalog.<CATALOG_NAME> com.scalar.db.analytics.spark.ScalarDbAnalyticsCatalog
149+
spark.sql.extensions com.scalar.db.analytics.spark.extension.ScalarDbAnalyticsExtensions
150+
spark.sql.catalog.<CATALOG_NAME>.license.key {{secrets/scalardb-analytics-secret/license-key}}
151+
spark.sql.catalog.<CATALOG_NAME>.license.cert_pem {{secrets/scalardb-analytics-secret/license-pem}}
152+
```
153+
154+
:::note
155+
156+
You also need to configure the data source. For details, refer to [Set up ScalarDB Analytics in the Spark configuration](development.mdx#set-up-scalardb-analytics-in-the-spark-configuration).
157+
158+
:::
159+
160+
:::note
161+
162+
If you specified different secret names in the previous step, be sure to replace the secret names in the configuration above.
163+
164+
:::
165+
166+
5. Add the library of ScalarDB Analytics to the launched cluster as a Maven dependency. For details on how to add the library, refer to the [Databricks cluster libraries documentation](https://docs.databricks.com/en/libraries/cluster-libraries.html).
167+
168+
<h4>Run analytical queries via the Spark Driver</h4>
169+
170+
You can run your Spark application on the properly configured Databricks cluster with Databricks Notebook or Databricks Jobs to access the tables in ScalarDB Analytics. To run the Spark application, you can migrate your Pyspark, Scala, or Spark SQL application to Databricks Notebook, or use Databricks Jobs to run your Spark application. ScalarDB Analytics works with task types for Notebook, Python, JAR, and SQL.
171+
172+
For more details on how to use Databricks Jobs, refer to the [Databricks Jobs documentation](https://docs.databricks.com/en/jobs/index.html)
173+
174+
<h4>Run analytical queries via the JDBC driver</h4>
175+
176+
Databricks supports JDBC to run SQL jobs on the cluster. You can use this feature to run your Spark application in SQL with ScalarDB Analytics by configuring extra settings as follows:
177+
178+
1. Download the ScalarDB Analytics library JAR file from the Maven repository.
179+
2. Upload the JAR file to the Databricks workspace.
180+
3. Add the JAR file to the cluster as a library, instead of the Maven dependency.
181+
4. Create an init script as follows, replacing `<PATH_TO_YOUR_JAR_FILE_IN_WORKSPACE>` with the path to your JAR file in the Databricks workspace:
182+
183+
```bash
184+
#!/bin/bash
185+
186+
# Target directories
187+
TARGET_DIRECTORIES=("/databricks/jars" "/databricks/hive_metastore_jars")
188+
JAR_PATH="<PATH_TO_YOUR_JAR_FILE_IN_WORKSPACE>
189+
190+
# Copy the JAR file to the target directories
191+
for TARGET_DIR in "${TARGET_DIRECTORIES[@]}"; do
192+
mkdir -p "$TARGET_DIR"
193+
cp "$JAR_PATH" "$TARGET_DIR/"
194+
done
195+
```
196+
197+
5. Upload the init script to the Databricks workspace.
198+
6. Add the init script to the cluster to "Advanced Options" > "Init scripts" when you launch the cluster.
199+
200+
After the cluster is launched, you can get the JDBC URL of the cluster in the "Advanced Options" > "JDBC/ODBC" tab on the cluster details page.
201+
202+
To connect to the Databricks cluster by using JDBC, you need to add the Databricks JDBC driver to your application dependencies. For example, if you are using Gradle, you can add the following dependency to your `build.gradle` file:
203+
204+
```groovy
205+
implementation("com.databricks:databricks-jdbc:0.9.6-oss")
206+
```
207+
208+
Then, you can connect to the Databricks cluster by using JDBC with the JDBC URL (`<YOUR_CLUSTERS_JDBC_URL>`), as is common with JDBC applications.
209+
210+
```java
211+
Class.forName("com.databricks.client.jdbc.Driver");
212+
String url = "<YOUR_CLUSTERS_JDBC_URL>";
213+
Connection conn = DriverManager.getConnection(url)
214+
```
215+
216+
For more details on how to use JDBC with Databricks, refer to the [Databricks JDBC Driver documentation](https://docs.databricks.com/en/integrations/jdbc/index.html).
217+
218+
</TabItem>
219+
</Tabs>

0 commit comments

Comments
 (0)