diff --git a/docs/scalardb-analytics/run-analytical-queries.mdx b/docs/scalardb-analytics/run-analytical-queries.mdx index 69c4414e..6a84789d 100644 --- a/docs/scalardb-analytics/run-analytical-queries.mdx +++ b/docs/scalardb-analytics/run-analytical-queries.mdx @@ -19,18 +19,18 @@ This section describes the prerequisites, setting up ScalarDB Analytics in the S ### Prerequisites -- **ScalarDB Analytics catalog server**: A running instance that manages catalog metadata and connects to your data sources. The server must be set up with at least one data source registered. For setup and data source registration instructions, see [Set up and administer the ScalarDB Analytics catalog server](./administration.mdx). -- **Apache Spark**: A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html). +- **ScalarDB Analytics server:** A running instance that manages catalog information and connects to your data sources. The server must be set up with at least one data source registered. For registering data sources, see [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx). +- **Apache Spark:** A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html). :::note -Apache Spark are built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. You can refer to [Version compatibility](#version-compatibility) for more details. +Apache Spark is built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. For more details, see [Version compatibility](#version-compatibility). ::: ### Set up ScalarDB Analytics in the Spark configuration -ScalarDB Analytics requires specific Spark configurations to integrate with the catalog server. +ScalarDB Analytics requires specific Spark configurations to integrate with ScalarDB Analytics server. #### Required Spark configurations @@ -44,7 +44,7 @@ When configuring Spark, you must specify a catalog name that matches the catalog #### Example configuration -Here's a complete example configuration: +The following is a complete example configuration: ```conf # 1. ScalarDB Analytics package @@ -60,27 +60,27 @@ spark.sql.catalog.myanalytics.server.catalog.port 11051 spark.sql.catalog.myanalytics.server.metering.port 11052 ``` -Replace the placeholders: +The following describes what you should change the content in the angle brackets to: -- ``: Your Spark version (e.g., `3.5` or `3.4`) -- ``: Your Scala version (e.g., `2.13` or `2.12`) -- ``: The ScalarDB Analytics version (e.g., `3.16.0`) +- ``: Your Spark version (for example, `3.5` or `3.4`) +- ``: Your Scala version (for example, `2.13` or `2.12`) +- ``: The ScalarDB Analytics version (for example, `3.16.0`) In this example: -- The catalog name `myanalytics` must match a catalog that exists on your ScalarDB Analytics server -- The ScalarDB Analytics server is running at `analytics-server.example.com` -- Tables will be accessed using the format: `myanalytics...` +- The catalog name `myanalytics` must match a catalog that exists on your ScalarDB Analytics server. +- The ScalarDB Analytics server is running at `analytics-server.example.com`. +- Tables will be accessed using the format: `myanalytics...
`. :::important -The catalog name in your Spark configuration must match the name of a catalog created on the ScalarDB Analytics server using the CLI. For example, if you created a catalog named `production` on the server, you must use `production` as the catalog name in your Spark configuration properties (e.g., `spark.sql.catalog.production`, `spark.sql.catalog.production.server.host`, etc.). +The catalog name in your Spark configuration must match the name of a catalog created on the ScalarDB Analytics server by using the CLI. For example, if you created a catalog named `production` on the server, you must use `production` as the catalog name in your Spark configuration properties (for example, `spark.sql.catalog.production`, `spark.sql.catalog.production.server.host`, etc.). ::: :::note -Data source configurations are managed by the catalog server. For information on configuring data sources in the catalog server, see [Set up and administer the ScalarDB Analytics catalog server](./administration.mdx#configure-data-sources). +Data source configurations are managed by ScalarDB Analytics server. For information on configuring data sources in ScalarDB Analytics server, see [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx). ::: @@ -88,15 +88,15 @@ Data source configurations are managed by the catalog server. For information on When developing Spark applications that use ScalarDB Analytics, you can add the dependency to your build configuration. For example, with Gradle: -```groovy +```kotlin dependencies { - implementation 'com.scalar-labs:scalardb-analytics-spark-all-_:' + implementation("com.scalar-labs:scalardb-analytics-spark-all-_:") } ``` :::note -If you bundle your application in a fat JAR using plugins like Gradle Shadow or Maven Shade, exclude ScalarDB Analytics from the fat JAR by using configurations such as `provided` or `shadow`. +If you bundle your application in a fat JAR by using plugins like Gradle Shadow or Maven Shade, exclude ScalarDB Analytics from the fat JAR by using configurations such as `provided` or `shadow`. ::: @@ -116,7 +116,7 @@ Depending on your environment, you may not be able to use all the methods mentio ::: -With all these methods, you can refer to tables in ScalarDB Analytics using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, refer to [Catalog metadata reference](./administration.mdx#catalog-metadata-reference). +With all these methods, you can refer to tables in ScalarDB Analytics by using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, see [Catalog information reference](./reference-data-source.mdx#catalog-information-reference). @@ -125,11 +125,11 @@ You can use a commonly used `SparkSession` class for ScalarDB Analytics. Additio To read data from tables in ScalarDB Analytics, you can use the `spark.sql` or `spark.read.table` function in the same way as when reading a normal Spark table. -First, you need to set up your Java project. For example, if you are using Gradle, you can add the following to your `build.gradle` file: +First, you need to set up your Java project. For example, if you are using Gradle, you can add the following to your `build.gradle.kts` file: -```groovy +```kotlin dependencies { - implementation 'com.scalar-labs:scalardb-analytics-spark-_:' + implementation("com.scalar-labs:scalardb-analytics-spark-_:") } ``` @@ -191,7 +191,7 @@ The versions of the packages must match the versions of Spark and ScalarDB Analy ::: -You also need to include the Spark Connect client package in your application. For example, if you are using Gradle, you can add the following to your `build.gradle` file: +You also need to include the Spark Connect client package in your application. For example, if you are using Gradle, you can add the following to your `build.gradle.kts` file: ```kotlin implementation("org.apache.spark:spark-connect-client-jvm_2.12:3.5.3") @@ -235,11 +235,11 @@ Unfortunately, Spark Thrift JDBC server does not support the Spark features that ScalarDB Analytics manages its own catalog, containing data sources, namespaces, tables, and columns. That information is automatically mapped to the Spark catalog. In this section, you will learn how ScalarDB Analytics maps its catalog information to the Spark catalog. -For details about how information in the raw data sources is mapped to the ScalarDB Analytics catalog, refer to [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source). +For details about how information in the raw data sources is mapped to the ScalarDB Analytics catalog, see [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source). ### Catalog structure mapping -ScalarDB Analytics maps catalog structure from data sources to Spark catalogs. Tables from data sources in the ScalarDB Analytics catalog are mapped to Spark tables using the following format: +ScalarDB Analytics maps catalog structure from data sources to Spark catalogs. Tables from data sources in the ScalarDB Analytics catalog are mapped to Spark tables by using the following format: ```console ...