Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 24 additions & 24 deletions docs/scalardb-analytics/run-analytical-queries.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,18 @@ This section describes the prerequisites, setting up ScalarDB Analytics in the S

### Prerequisites

- **ScalarDB Analytics catalog server**: A running instance that manages catalog metadata and connects to your data sources. The server must be set up with at least one data source registered. For setup and data source registration instructions, see [Set up and administer the ScalarDB Analytics catalog server](./administration.mdx).
- **Apache Spark**: A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html).
- **ScalarDB Analytics server:** A running instance that manages catalog information and connects to your data sources. The server must be set up with at least one data source registered. For registering data sources, see [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx).
- **Apache Spark:** A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html).

:::note

Apache Spark are built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. You can refer to [Version compatibility](#version-compatibility) for more details.
Apache Spark is built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. For more details, see [Version compatibility](#version-compatibility).

:::

### Set up ScalarDB Analytics in the Spark configuration

ScalarDB Analytics requires specific Spark configurations to integrate with the catalog server.
ScalarDB Analytics requires specific Spark configurations to integrate with ScalarDB Analytics server.

#### Required Spark configurations

Expand All @@ -44,7 +44,7 @@ When configuring Spark, you must specify a catalog name that matches the catalog

#### Example configuration

Here's a complete example configuration:
The following is a complete example configuration:

```conf
# 1. ScalarDB Analytics package
Expand All @@ -60,43 +60,43 @@ spark.sql.catalog.myanalytics.server.catalog.port 11051
spark.sql.catalog.myanalytics.server.metering.port 11052
```

Replace the placeholders:
The following describes what you should change the content in the angle brackets to:

- `<SPARK_VERSION>`: Your Spark version (e.g., `3.5` or `3.4`)
- `<SCALA_VERSION>`: Your Scala version (e.g., `2.13` or `2.12`)
- `<SCALARDB_ANALYTICS_VERSION>`: The ScalarDB Analytics version (e.g., `3.16.0`)
- `<SPARK_VERSION>`: Your Spark version (for example, `3.5` or `3.4`)
- `<SCALA_VERSION>`: Your Scala version (for example, `2.13` or `2.12`)
- `<SCALARDB_ANALYTICS_VERSION>`: The ScalarDB Analytics version (for example, `3.16.0`)

In this example:

- The catalog name `myanalytics` must match a catalog that exists on your ScalarDB Analytics server
- The ScalarDB Analytics server is running at `analytics-server.example.com`
- Tables will be accessed using the format: `myanalytics.<data_source>.<namespace>.<table>`
- The catalog name `myanalytics` must match a catalog that exists on your ScalarDB Analytics server.
- The ScalarDB Analytics server is running at `analytics-server.example.com`.
- Tables will be accessed using the format: `myanalytics.<data_source>.<namespace>.<table>`.

:::important

The catalog name in your Spark configuration must match the name of a catalog created on the ScalarDB Analytics server using the CLI. For example, if you created a catalog named `production` on the server, you must use `production` as the catalog name in your Spark configuration properties (e.g., `spark.sql.catalog.production`, `spark.sql.catalog.production.server.host`, etc.).
The catalog name in your Spark configuration must match the name of a catalog created on the ScalarDB Analytics server by using the CLI. For example, if you created a catalog named `production` on the server, you must use `production` as the catalog name in your Spark configuration properties (for example, `spark.sql.catalog.production`, `spark.sql.catalog.production.server.host`, etc.).

:::

:::note

Data source configurations are managed by the catalog server. For information on configuring data sources in the catalog server, see [Set up and administer the ScalarDB Analytics catalog server](./administration.mdx#configure-data-sources).
Data source configurations are managed by ScalarDB Analytics server. For information on configuring data sources in ScalarDB Analytics server, see [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx).

:::

### Build configuration for Spark applications

When developing Spark applications that use ScalarDB Analytics, you can add the dependency to your build configuration. For example, with Gradle:

```groovy
```kotlin
dependencies {
implementation 'com.scalar-labs:scalardb-analytics-spark-all-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>'
implementation("com.scalar-labs:scalardb-analytics-spark-all-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>")
}
```

:::note

If you bundle your application in a fat JAR using plugins like Gradle Shadow or Maven Shade, exclude ScalarDB Analytics from the fat JAR by using configurations such as `provided` or `shadow`.
If you bundle your application in a fat JAR by using plugins like Gradle Shadow or Maven Shade, exclude ScalarDB Analytics from the fat JAR by using configurations such as `provided` or `shadow`.

:::

Expand All @@ -116,7 +116,7 @@ Depending on your environment, you may not be able to use all the methods mentio

:::

With all these methods, you can refer to tables in ScalarDB Analytics using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, refer to [Catalog metadata reference](./administration.mdx#catalog-metadata-reference).
With all these methods, you can refer to tables in ScalarDB Analytics by using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, see [Catalog information reference](./reference-data-source.mdx#catalog-information-reference).

<Tabs groupId="spark-application-type" queryString>
<TabItem value="spark-driver" label="Spark driver application">
Expand All @@ -125,11 +125,11 @@ You can use a commonly used `SparkSession` class for ScalarDB Analytics. Additio

To read data from tables in ScalarDB Analytics, you can use the `spark.sql` or `spark.read.table` function in the same way as when reading a normal Spark table.

First, you need to set up your Java project. For example, if you are using Gradle, you can add the following to your `build.gradle` file:
First, you need to set up your Java project. For example, if you are using Gradle, you can add the following to your `build.gradle.kts` file:

```groovy
```kotlin
dependencies {
implementation 'com.scalar-labs:scalardb-analytics-spark-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>'
implementation("com.scalar-labs:scalardb-analytics-spark-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>")
}
```

Expand Down Expand Up @@ -191,7 +191,7 @@ The versions of the packages must match the versions of Spark and ScalarDB Analy

:::

You also need to include the Spark Connect client package in your application. For example, if you are using Gradle, you can add the following to your `build.gradle` file:
You also need to include the Spark Connect client package in your application. For example, if you are using Gradle, you can add the following to your `build.gradle.kts` file:

```kotlin
implementation("org.apache.spark:spark-connect-client-jvm_2.12:3.5.3")
Expand Down Expand Up @@ -235,11 +235,11 @@ Unfortunately, Spark Thrift JDBC server does not support the Spark features that

ScalarDB Analytics manages its own catalog, containing data sources, namespaces, tables, and columns. That information is automatically mapped to the Spark catalog. In this section, you will learn how ScalarDB Analytics maps its catalog information to the Spark catalog.

For details about how information in the raw data sources is mapped to the ScalarDB Analytics catalog, refer to [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source).
For details about how information in the raw data sources is mapped to the ScalarDB Analytics catalog, see [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source).

### Catalog structure mapping

ScalarDB Analytics maps catalog structure from data sources to Spark catalogs. Tables from data sources in the ScalarDB Analytics catalog are mapped to Spark tables using the following format:
ScalarDB Analytics maps catalog structure from data sources to Spark catalogs. Tables from data sources in the ScalarDB Analytics catalog are mapped to Spark tables by using the following format:

```console
<CATALOG_NAME>.<DATA_SOURCE_NAME>.<NAMESPACE_NAMES>.<TABLE_NAME>
Expand Down
Loading