Skip to content

Commit 21fbf60

Browse files
committed
AUTO: Sync ScalarDB docs in English to docs site repo
1 parent b881a14 commit 21fbf60

File tree

5 files changed

+695
-0
lines changed

5 files changed

+695
-0
lines changed
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
---
2+
tags:
3+
- Enterprise Option
4+
- Public Preview
5+
---
6+
7+
# Configuration of ScalarDB Analytics with Spark
8+
9+
import Tabs from '@theme/Tabs';
10+
import TabItem from '@theme/TabItem';
11+
12+
There are two ways to configure ScalarDB Analytics with Spark:
13+
14+
- By configuring the properties in `spark.conf`
15+
- By using the helper method that ScalarDB Analytics with Spark provides
16+
17+
Both ways are conceptually equivalent processes, so you can choose either one based on your preference.
18+
19+
## Configure ScalarDB Analytics with Spark by using `spark.conf`
20+
21+
Since ScalarDB Analytics with Spark is provided as a Spark custom catalog plugin, you can enable ScalarDB Analytics with Spark via `spark.conf`.
22+
23+
```properties
24+
spark.sql.catalog.scalardb_catalog = com.scalar.db.analytics.spark.datasource.ScalarDbCatalog
25+
spark.sql.catalog.scalardb_catalog.config = /<PATH_TO_YOUR_SCALARDB_PROPERTIES>/config.properties
26+
spark.sql.catalog.scalardb_catalog.namespaces = <YOUR_NAMESPACE_NAME_2>,<YOUR_NAMESPACE_NAME_2>
27+
spark.sql.catalog.scalardb_catalog.license.key = {"your":"license", "key":"in", "json":"format"}
28+
spark.sql.catalog.scalardb_catalog.license.cert_path = /<PATH_TO_YOUR_LICENSE>/cert.pem
29+
```
30+
31+
:::note
32+
33+
The `scalardb_catalog` part is a configurable catalog name. You may choose any name you prefer.
34+
35+
:::
36+
37+
### Available properties
38+
39+
The following is a list of available properties for ScalarDB Analytics with Spark:
40+
41+
| Property name | Required | Description |
42+
|------------------------------------------------------|------------------------------------------------|-------------------------------------------------------------------------|
43+
| `spark.sql.catalog.{catalog_name}` | Yes | Must be `com.scalar.db.analytics.spark.datasource.ScalarDbCatalog` |
44+
| `spark.sql.catalog.{catalog_name}.config` | Yes | Path to the ScalarDB configuration file |
45+
| `spark.sql.catalog.{catalog_name}.namespaces` | Yes | Comma-separated list of ScalarDB namespaces to import to the Spark side |
46+
| `spark.sql.catalog.{catalog_name}.license.key` | Yes | Your license key in the JSON format |
47+
| `spark.sql.catalog.{catalog_name}.license.cert_path` | Either this or `license.cert_pem` is required | Path to your license certificate file |
48+
| `spark.sql.catalog.{catalog_name}.license.cert_pem` | Either this or `license.cert_path` is required | Your license certificate in the PEM format |
49+
50+
### Importing schemas
51+
52+
After properly setting `spark.conf`, you should have a catalog in your Spark environment, which contains tables connected to the underlying databases of ScalarDB. However, the catalog provides access to raw tables that contain transaction metadata, which is managed by ScalarDB. Instead, you may only be interested in the application-managed data without transaction metadata.
53+
54+
For this purpose, ScalarDB Analytics with Spark provides the `SchemaImporter` class, which creates views that interpret the transaction metadata and show only application-managed data. Those views have an equivalent schema to the ScalarDB tables, and users can use the views as if they were ScalarDB tables. The following is an example of how to run `SchemaImporter` with the properly set catalog.
55+
56+
```java
57+
import com.scalar.db.analytics.spark.view.SchemaImporter
58+
59+
class YourApp {
60+
public static void main(String[] args) {
61+
SparkSession spark = SparkSession.builder().appName("<YOUR_APPLICATION_NAME>").getOrCreate()
62+
new SchemaImporter(spark, "scalardb_catalog").run() // Import ScalarDB table schemas from the catalog named "scalardb_catalog"
63+
spark.sql("select * from <YOUR_NAMESPACE_NAME_1>.<YOUR_TABLE_NAME>").show()
64+
spark.stop()
65+
}
66+
}
67+
```
68+
69+
## Configure ScalarDB Analytics with Spark by using the helper method
70+
71+
You can use a helper method that ScalarDB Analytics with Spark provides to get everything set up to run analytical queries, including configuring the catalog and importing the schemas. In addition, you can use the helper method to set up ScalarDB Analytics with Spark in application code. This would be useful for doing a quick test without prior configuration.
72+
73+
The helper method is provided through Java and Scala. In Java, you can use `ScalarDbAnalyticsInitializer` to specify the options, which are equivalent to the properties in `spark.conf`, as follows:
74+
75+
```java
76+
import com.scalar.db.analytics.spark.ScalarDbAnalyticsInitializer
77+
78+
class YourApp {
79+
public static void main(String[] args) {
80+
// Initialize SparkSession as usual
81+
SparkSession spark = SparkSession.builder().appName("<YOUR_APPLICATION_NAME>").getOrCreate()
82+
// Setup ScalarDB Analytics with Spark via helper class
83+
ScalarDbAnalyticsInitializer
84+
.builder()
85+
.spark(spark)
86+
.configPath("/<PATH_TO_YOUR_SCALARDB_PROPERTIES>/config.properties")
87+
.namespace("<YOUR_NAMESPACE_NAME_1>")
88+
.namespace("<YOUR_NAMESPACE_NAME_2>")
89+
.licenseKey("{\"your\":\"license\", \"key\":\"in\", \"json\":\"format\"}")
90+
.licenseCertPath("/<PATH_TO_YOUR_LICENSE>/cert.pem")
91+
.build()
92+
.run()
93+
// Run arbitrary queries
94+
spark.sql("select * from <YOUR_NAMESPACE_NAME_1>.<YOUR_TABLE_NAME>").show()
95+
// Stop SparkSession
96+
spark.stop()
97+
}
98+
}
99+
```
100+
101+
In Scala, the `setupScalarDbAnalytics` method is available as an extension of `SparkSession`:
102+
103+
```scala
104+
import com.scalar.db.analytics.spark.implicits._
105+
106+
object YourApp {
107+
def main(args: Array[String]): Unit = {
108+
// Initialize SparkSession as usual
109+
val spark = SparkSession.builder.appName("<YOUR_APPLICATION_NAME>").getOrCreate()
110+
// Setup ScalarDB Analytics with Spark via helper method
111+
spark.setupScalarDbAnalytics(
112+
// ScalarDB config file
113+
configPath = "/<PATH_TO_YOUR_SCALARDB_PROPERTIES>/config.properties",
114+
// Namespaces in ScalarDB to import
115+
namespaces = Set("<YOUR_NAMESPACE_NAME_1>", "<YOUR_NAMESPACE_NAME_2>"),
116+
// License information
117+
license = License.certPath("""{"your":"license", "key":"in", "json":"format"}""", "/<PATH_TO_YOUR_LICENSE>/cert.pem")
118+
)
119+
// Run arbitrary queries
120+
spark.sql("select * from <YOUR_NAMESPACE_NAME_1>.<YOUR_TABLE_NAME>").show()
121+
// Stop SparkSession
122+
spark.stop()
123+
}
124+
}
125+
```
126+
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
---
2+
tags:
3+
- Enterprise Option
4+
- Public Preview
5+
---
6+
7+
# Getting Started with ScalarDB Analytics with Spark
8+
9+
import Tabs from '@theme/Tabs';
10+
import TabItem from '@theme/TabItem';
11+
12+
This guide explains how to get started with ScalarDB Analytics with Spark.
13+
14+
## Prerequisites
15+
16+
Before you can run queries with ScalarDB Analytics with Spark, you'll need to set up ScalarDB tables and install Apache Spark.
17+
18+
### Set up ScalarDB tables
19+
20+
To use ScalarDB Analytics with Spark, you need at least one underlying database in ScalarDB to run analytical queries on. If you have your own underlying database set up in ScalarDB, you can skip this section and use your database instead.
21+
22+
If you don't have your own database set up yet, you can set up ScalarDB with a sample underlying database by following the instructions in [Run Analytical Queries on Sample Data by Using ScalarDB Analytics with Spark](../scalardb-samples/scalardb-analytics-spark-sample/README.mdx).
23+
24+
### Install Apache Spark
25+
26+
You also need a packaged release of Apache Spark. If you already have Spark installed, you can skip this section.
27+
28+
If you need Spark, you can download it from the [Spark website](https://spark.apache.org/downloads.html). After downloading the compressed Spark file, you'll need to uncompress the file by running the following command, replacing `X.X.X` with the version of Spark that you downloaded:
29+
30+
```console
31+
tar xf spark-X.X.X-bin-hadoop3.tgz
32+
```
33+
34+
Then, enter the directory by running the following command, again replacing `X.X.X` with the version of Spark that you downloaded:
35+
36+
```console
37+
cd spark-X.X.X-bin-hadoop3
38+
```
39+
40+
## Configure the Spark shell
41+
42+
The following explains how to perform interactive analysis by using the Spark shell.
43+
44+
Since ScalarDB Analytics with Spark is available on the Maven Central Repository, you can use it to enable ScalarDB Analytics with Spark in the Spark shell by using the `--packages` option, replacing `<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_WITH_SPARK_VERSION>` with the versions that you're using.
45+
46+
```console
47+
./bin/spark-shell --packages com.scalar-labs:scalardb-analytics-spark-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_WITH_SPARK_VERSION>
48+
```
49+
50+
:::warning
51+
52+
ScalarDB Analytics with Spark offers different artifacts for various Spark and Scala versions, provided in the format `scalardb-analytics-spark-<SPARK_VERSION>_<SCALA_VERSION>`. Make sure that you select the artifact matching the Spark and Scala versions you're using.
53+
54+
For reference, see [Version Compatibility of ScalarDB Analytics with Spark](version-compatibility.mdx).
55+
56+
:::
57+
58+
Next, you'll need to configure the ScalarDB Analytics with Spark environment in the shell. ScalarDB Analytics with Spark provides a helper method for this purpose, which get everything set up to run analytical queries for you.
59+
60+
```scala
61+
spark-shell> import com.scalar.db.analytics.spark.implicits._
62+
spark-shell> spark.setupScalarDbAnalytics(
63+
| // ScalarDB config file
64+
| configPath = "/<PATH_TO_YOUR_SCALARDB_PROPERTIES>/config.properties",
65+
| // Namespaces in ScalarDB to import
66+
| namespaces = Set("<YOUR_NAMESPACE_NAME_1>", "<YOUR_NAMESPACE_NAME_2>"),
67+
| // License information
68+
| license = License.certPath("""{"your":"license", "key":"in", "json":"format"}""", "/<PATH_TO_YOUR_LICENSE>/cert.pem")
69+
| )
70+
```
71+
72+
Now, you can read data from the tables in the underlying databases of ScalarDB and run any arbitrary analytical queries through the Spark Dataset API. For example:
73+
74+
```console
75+
spark-shell> spark.sql("select * from <YOUR_NAMESPACE_NAME_1>.<YOUR_TABLE_NAME>").show()
76+
````
77+
78+
## Implement and submit a Spark application
79+
80+
This section explains how to implement a Spark application with ScalarDB Analytics with Spark and submit it to the Spark cluster.
81+
82+
You can integrate ScalarDB Analytics with Spark into your application by using build tools like SBT, Gradle, or Maven.
83+
84+
<Tabs groupId="implementation" queryString>
85+
<TabItem value="gradle" label="Gradle" default>
86+
For Gradle projects, add the following to your `build.gradle.kts` file, replacing `<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_WITH_SPARK_VERSION>` with the versions that you're using:
87+
88+
```kotlin
89+
implementation("com.scalar-labs:scalardb-analytics-spark-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_WITH_SPARK_VERSION>")
90+
```
91+
</TabItem>
92+
<TabItem value="maven" label="Maven" default>
93+
To configure Gradle by using Groovy, add the following to your `build.gradle` file, replacing `<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_WITH_SPARK_VERSION>` with the versions that you're using:
94+
95+
```groovy
96+
implementation 'com.scalar-labs:scalardb-analytics-spark-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_WITH_SPARK_VERSION>'
97+
```
98+
</TabItem>
99+
<TabItem value="sbt" label="SBT">
100+
To add your application to an SBT project, insert the following into your `build.sbt` file, replacing `<SPARK_VERSION>` and `<SCALA_VERSION>` with the versions that you're using:
101+
102+
```scala
103+
libraryDependencies += "com.scalar-labs" %% "scalardb-analytics-spark-<SPARK_VERSION>" % "<SCALA_VERSION>"
104+
```
105+
</TabItem>
106+
</Tabs>
107+
108+
After integrating ScalarDB Analytics with Spark into your application, you can use the same helper method explained above to configure ScalarDB Analytics with Spark in your Spark application.
109+
110+
<Tabs groupId="helper_method" queryString>
111+
<TabItem value="Scala" label="Scala" default>
112+
The following is a sample application that uses Scala:
113+
114+
```scala
115+
import com.scalar.db.analytics.spark.implicits._
116+
117+
object YourApp {
118+
def main(args: Array[String]): Unit = {
119+
// Initialize SparkSession as usual
120+
val spark = SparkSession.builder.appName("<YOUR_APPLICATION_NAME>").getOrCreate()
121+
// Setup ScalarDB Analytics with Spark via helper method
122+
spark.setupScalarDbAnalytics(
123+
// ScalarDB config file
124+
configPath = "/<PATH_TO_YOUR_SCALARDB_PROPERTIES>/config.properties",
125+
// Namespaces in ScalarDB to import
126+
namespaces = Set("<YOUR_NAMESPACE_NAME_1>", "<YOUR_NAMESPACE_NAME_2>"),
127+
// License information
128+
license = License.certPath("""{"your":"license", "key":"in", "json":"format"}""", "/<PATH_TO_YOUR_LICENSE>/cert.pem")
129+
)
130+
// Run arbitrary queries
131+
spark.sql("select * from <YOUR_NAMESPACE_NAME_1>.<YOUR_TABLE_NAME>").show()
132+
// Stop SparkSession
133+
spark.stop()
134+
}
135+
}
136+
```
137+
</TabItem>
138+
<TabItem value="Java" label="Java">
139+
You can write a Spark application with ScalarDB Analytics with Spark in Java:
140+
141+
```java
142+
import com.scalar.db.analytics.spark.ScalarDbAnalyticsInitializer
143+
144+
class YourApp {
145+
public static void main(String[] args) {
146+
// Initialize SparkSession as usual
147+
SparkSession spark = SparkSession.builder().appName("<YOUR_APPLICATION_NAME>").getOrCreate()
148+
// Setup ScalarDB Analytics with Spark via helper class
149+
ScalarDbAnalyticsInitializer
150+
.builder()
151+
.spark(spark)
152+
.configPath("/<PATH_TO_YOUR_SCALARDB_PROPERTIES>/config.properties")
153+
.namespace("<YOUR_NAMESPACE_NAME_1>")
154+
.namespace("<YOUR_NAMESPACE_NAME_2>")
155+
.licenseKey("{\"your\":\"license\", \"key\":\"in\", \"json\":\"format\"}")
156+
.licenseCertPath("/<PATH_TO_YOUR_LICENSE>/cert.pem")
157+
.build()
158+
.run()
159+
// Run arbitrary queries
160+
spark.sql("select * from <YOUR_NAMESPACE_NAME_1>.<YOUR_TABLE_NAME>").show()
161+
// Stop SparkSession
162+
spark.stop()
163+
}
164+
}
165+
```
166+
</TabItem>
167+
</Tabs>
168+
169+
Then, you need to build a .jar file by using your preferred build tool, like `sbt package` or `./gradlew assemble`.
170+
171+
After building the .jar file, you can submit that .jar file to your Spark cluster by using `spark-submit`, using the `--packages` option to enable the ScalarDB Analytics libraries on your cluster by running the following command, replacing `<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_WITH_SPARK_VERSION>` with the versions that you're using:
172+
173+
```console
174+
./bin/spark-submit \
175+
--class "YourApp" \
176+
--packages com.scalar-labs:scalardb-analytics-spark-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_WITH_SPARK_VERSION> \
177+
<YOUR_APP_NAME>.jar
178+
```
179+
180+
For more information about general Spark application development, see the [Apache Spark documentation](https://spark.apache.org/docs/latest/).
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
tags:
3+
- Community
4+
- Enterprise Standard
5+
- Enterprise Premium
6+
---
7+
8+
# ScalarDB Data Loader API
9+
10+
This document describes how to get started with the ScalarDB data loader API.
11+
12+
## Start ScalarDB Data Loader API
13+
14+
* Clone the `scalardb-data-loader` repository
15+
```
16+
git clone https://github.com/scalar-labs/scalardb-data-loader.git
17+
```
18+
19+
* Navigate to the `scalardb-data-loader` directory
20+
21+
* Update the following properties in the `fixtures/test/conf/application.yml` file with appropriate cloud credentials
22+
* storage.jclouds.provider
23+
* storage.jclouds.identity
24+
* storage.jclouds.credential
25+
* storage.jclouds.container
26+
27+
* Build the `scalardb-data-loader-api` docker image
28+
```
29+
./gradlew :api:docker
30+
```
31+
32+
* Start the `scalardb-data-loader-api`
33+
```
34+
cd fixtures/test
35+
docker-compose up
36+
```

0 commit comments

Comments
 (0)