diff --git a/docs/scalardb-analytics/_README.mdx b/docs/scalardb-analytics/_README.mdx
new file mode 100644
index 00000000..fa475f1a
--- /dev/null
+++ b/docs/scalardb-analytics/_README.mdx
@@ -0,0 +1,36 @@
+---
+tags:
+ - Enterprise Option
+displayed_sidebar: docsEnglish
+---
+
+# ScalarDB Analytics
+
+import WarningLicenseKeyContact from '/src/components/en-us/_warning-license-key-contact.mdx';
+
+**ScalarDB Analytics** is the analytical component of ScalarDB. Similar to ScalarDB, it unifies diverse data sources - ranging from RDBMSs like PostgreSQL and MySQL to NoSQL databases such as Cassandra and DynamoDB - into a single logical database. While ScalarDB focuses on operational workloads with strong transactional consistency across multiple databases, ScalarDB Analytics is optimized for analytical workloads. It supports a wide range of queries, including complex joins, aggregations, and window functions. ScalarDB Analytics operates seamlessly on both ScalarDB-managed data sources and non-ScalarDB-managed ones, enabling advanced analytical queries across various datasets.
+
+The current version of ScalarDB Analytics leverages **Apache Spark** as its execution engine. It provides a unified view of ScalarDB-managed and non-ScalarDB-managed data sources by utilizing a Spark custom catalog. Using ScalarDB Analytics, you can treat tables from these data sources as native Spark tables. This allows you to execute arbitrary Spark SQL queries seamlessly. For example, you can join a table stored in Cassandra with a table in PostgreSQL to perform a cross-database analysis with ease.
+
+
+
+## Further reading
+
+This section provides links to various ScalarDB Analytics–related documentation.
+
+### Getting started
+
+* [Getting Started with ScalarDB Analytics](./quickstart.mdx) - A quick tutorial to set up ScalarDB Analytics and run federated queries
+
+### Key documentation
+
+* [Overview](./overview.mdx) - Understand ScalarDB Analytics architecture and features
+* [Deploy ScalarDB Analytics](./deployment.mdx) - Deploy on Amazon EMR, Databricks, and other platforms
+* [Run Analytical Queries](./run-analytical-queries.mdx) - Execute queries across multiple databases
+* [Administration Guide](./administration.mdx) - Manage catalogs and data sources
+* [Configuration Reference](./configuration.mdx) - Configure Spark and data sources
+
+### Technical details
+
+* [Design Document](./design.mdx) - Deep dive into the technical architecture
+* [Version Compatibility](./run-analytical-queries.mdx#version-compatibility) - Supported Spark and Scala versions
diff --git a/docs/scalardb-analytics/design.mdx b/docs/scalardb-analytics/design.mdx
index 3324887c..e1f99d07 100644
--- a/docs/scalardb-analytics/design.mdx
+++ b/docs/scalardb-analytics/design.mdx
@@ -86,16 +86,294 @@ ScalarDB Analytics supports a wide range of data types across different data sou
- `DURATION`
- `INTERVAL`
-These data types are used across all data sources and provide a unified type system for querying heterogeneous databases.
+### Catalog information mappings by data source
-### Data source integration
+When registering a data source to ScalarDB Analytics, the catalog information of the data source, that is, namespaces, tables, and columns, are resolved and registered to the universal data catalog. To resolve the catalog information of the data source, a particular object on the data sources side are mapped to the universal data catalog object. This mapping is consists of two parts: catalog-level mappings and data-type mappings. In the following sections, we describe how ScalarDB Analytics maps the catalog level and data type from each data source into the universal data catalog.
-When registering a data source to ScalarDB Analytics, two types of mappings occur:
+#### Catalog-level mappings
-1. **Catalog structure mapping**: The data source's catalog information (namespaces, tables, and columns) is resolved and mapped to the universal data catalog structure
-2. **Data type mapping**: Native data types from each data source are mapped to the universal data types listed above
+The catalog-level mappings are the mappings of the namespace names, table names, and column names from the data sources to the universal data catalog. To see the catalog-level mappings in each data source, select a data source.
-These mappings ensure compatibility and consistency across different database systems. For detailed information about how specific databases are mapped, see [Catalog metadata reference](administration.mdx#catalog-metadata-reference) in the administration guide.
+
+
+ The catalog information of ScalarDB is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
+
+ - The ScalarDB namespace is mapped to the namespace. Therefore, the namespace of the ScalarDB data source is always single level, consisting of only the namespace name.
+ - The ScalarDB table is mapped to the table.
+ - The ScalarDB column is mapped to the column.
+
+
+
+
+ The catalog information of PostgreSQL is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
+
+ - The PostgreSQL schema is mapped to the namespace. Therefore, the namespace of the PostgreSQL data source is always single level, consisting of only the schema name.
+ - Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
+ - `information_schema`
+ - `pg_catalog`
+ - The PostgreSQL table is mapped to the table.
+ - The PostgreSQL column is mapped to the column.
+
+
+
+ The catalog information of MySQL is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
+
+ - The MySQL database is mapped to the namespace. Therefore, the namespace of the MySQL data source is always single level, consisting of only the database name.
+ - Only user-defined databases are mapped to namespaces. The following system databases are ignored:
+ - `mysql`
+ - `sys`
+ - `information_schema`
+ - `performance_schema`
+ - The MySQL table is mapped to the table.
+ - The MySQL column is mapped to the column.
+
+
+
+ The catalog information of Oracle is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
+
+ - The Oracle schema is mapped to the namespace. Therefore, the namespace of the Oracle data source is always single level, consisting of only schema name.
+ - Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
+ - `ANONYMOUS`
+ - `APPQOSSYS`
+ - `AUDSYS`
+ - `CTXSYS`
+ - `DBSNMP`
+ - `DGPDB_INT`
+ - `DBSFWUSER`
+ - `DVF`
+ - `DVSYS`
+ - `GGSYS`
+ - `GSMADMIN_INTERNAL`
+ - `GSMCATUSER`
+ - `GSMROOTUSER`
+ - `GSMUSER`
+ - `LBACSYS`
+ - `MDSYS`
+ - `OJVMSYS`
+ - `ORDDATA`
+ - `ORDPLUGINS`
+ - `ORDSYS`
+ - `OUTLN`
+ - `REMOTE_SCHEDULER_AGENT`
+ - `SI_INFORMTN_SCHEMA`
+ - `SYS`
+ - `SYS$UMF`
+ - `SYSBACKUP`
+ - `SYSDG`
+ - `SYSKM`
+ - `SYSRAC`
+ - `SYSTEM`
+ - `WMSYS`
+ - `XDB`
+ - `DIP`
+ - `MDDATA`
+ - `ORACLE_OCM`
+ - `XS$NULL`
+
+
+
+ The catalog information of SQL Server is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
+
+ - The SQL Server database and schema are mapped to the namespace together. Therefore, the namespace of the SQL Server data source is always two-level, consisting of the database name and the schema name.
+ - Only user-defined databases are mapped to namespaces. The following system databases are ignored:
+ - `sys`
+ - `guest`
+ - `INFORMATION_SCHEMA`
+ - `db_accessadmin`
+ - `db_backupoperator`
+ - `db_datareader`
+ - `db_datawriter`
+ - `db_ddladmin`
+ - `db_denydatareader`
+ - `db_denydatawriter`
+ - `db_owner`
+ - `db_securityadmin`
+ - Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
+ - `master`
+ - `model`
+ - `msdb`
+ - `tempdb`
+ - The SQL Server table is mapped to the table.
+ - The SQL Server column is mapped to the column.
+
+
+
+ Since DynamoDB is schema-less, you need to specify the catalog information explicitly when registering a DynamoDB data source by using the following format JSON:
+
+ ```json
+ {
+ "namespaces": [
+ {
+ "name": "",
+ "tables": [
+ {
+ "name": "",
+ "columns": [
+ {
+ "name": "",
+ "type": ""
+ },
+ ...
+ ]
+ },
+ ...
+ ]
+ },
+ ...
+ ]
+ }
+ ```
+
+ In the specified JSON, you can use any arbitrary namespace names, but the table names must match the table names in DynamoDB and column name and type must match field names and types in DynamoDB.
+
+
+
+
+#### Data-type mappings
+
+The native data types of the underlying data sources are mapped to the data types in ScalarDB Analytics. To see the data-type mappings in each data source, select a data source.
+
+
+
+ | **ScalarDB Data Type** | **ScalarDB Analytics Data Type** |
+ |:------------------------------|:---------------------------------|
+ | `BOOLEAN` | `BOOLEAN` |
+ | `INT` | `INT` |
+ | `BIGINT` | `BIGINT` |
+ | `FLOAT` | `FLOAT` |
+ | `DOUBLE` | `DOUBLE` |
+ | `TEXT` | `TEXT` |
+ | `BLOB` | `BLOB` |
+ | `DATE` | `DATE` |
+ | `TIME` | `TIME` |
+ | `TIMESTAMP` | `TIMESTAMP` |
+ | `TIMESTAMPTZ` | `TIMESTAMPTZ` |
+
+
+ | **PostgreSQL Data Type** | **ScalarDB Analytics Data Type** |
+ |:------------------------------|:---------------------------------|
+ | `integer` | `INT` |
+ | `bigint` | `BIGINT` |
+ | `real` | `FLOAT` |
+ | `double precision` | `DOUBLE` |
+ | `smallserial` | `SMALLINT` |
+ | `serial` | `INT` |
+ | `bigserial` | `BIGINT` |
+ | `char` | `TEXT` |
+ | `varchar` | `TEXT` |
+ | `text` | `TEXT` |
+ | `bpchar` | `TEXT` |
+ | `boolean` | `BOOLEAN` |
+ | `bytea` | `BLOB` |
+ | `date` | `DATE` |
+ | `time` | `TIME` |
+ | `time with time zone` | `TIME` |
+ | `time without time zone` | `TIME` |
+ | `timestamp` | `TIMESTAMP` |
+ | `timestamp with time zone` | `TIMESTAMPTZ` |
+ | `timestamp without time zone` | `TIMESTAMP` |
+
+
+ | **MySQL Data Type** | **ScalarDB Analytics Data Type** |
+ |:-----------------------|:---------------------------------|
+ | `bit` | `BOOLEAN` |
+ | `bit(1)` | `BOOLEAN` |
+ | `bit(x)` if *x >= 2* | `BLOB` |
+ | `tinyint` | `SMALLINT` |
+ | `tinyint(1)` | `BOOLEAN` |
+ | `boolean` | `BOOLEAN` |
+ | `smallint` | `SMALLINT` |
+ | `smallint unsigned` | `INT` |
+ | `mediumint` | `INT` |
+ | `mediumint unsigned` | `INT` |
+ | `int` | `INT` |
+ | `int unsigned` | `BIGINT` |
+ | `bigint` | `BIGINT` |
+ | `float` | `FLOAT` |
+ | `double` | `DOUBLE` |
+ | `real` | `DOUBLE` |
+ | `char` | `TEXT` |
+ | `varchar` | `TEXT` |
+ | `text` | `TEXT` |
+ | `binary` | `BLOB` |
+ | `varbinary` | `BLOB` |
+ | `blob` | `BLOB` |
+ | `date` | `DATE` |
+ | `time` | `TIME` |
+ | `datetime` | `TIMESTAMP` |
+ | `timestamp` | `TIMESTAMPTZ` |
+
+
+ | **Oracle Data Type** | **ScalarDB Analytics Data Type** |
+ |:-----------------------------------|:---------------------------------|
+ | `NUMBER` if *scale = 0* | `BIGINT` |
+ | `NUMBER` if *scale > 0* | `DOUBLE` |
+ | `FLOAT` if *precision ≤ 53* | `DOUBLE` |
+ | `BINARY_FLOAT` | `FLOAT` |
+ | `BINARY_DOUBLE` | `DOUBLE` |
+ | `CHAR` | `TEXT` |
+ | `NCHAR` | `TEXT` |
+ | `VARCHAR2` | `TEXT` |
+ | `NVARCHAR2` | `TEXT` |
+ | `CLOB` | `TEXT` |
+ | `NCLOB` | `TEXT` |
+ | `BLOB` | `BLOB` |
+ | `BOOLEAN` | `BOOLEAN` |
+ | `DATE` | `DATE` |
+ | `TIMESTAMP` | `TIMESTAMPTZ` |
+ | `TIMESTAMP WITH TIME ZONE` | `TIMESTAMPTZ` |
+ | `TIMESTAMP WITH LOCAL TIME ZONE` | `TIMESTAMP` |
+ | `RAW` | `BLOB` |
+
+
+ | **SQL Server Data Type** | **ScalarDB Analytics Data Type** |
+ |:---------------------------|:---------------------------------|
+ | `bit` | `BOOLEAN` |
+ | `tinyint` | `SMALLINT` |
+ | `smallint` | `SMALLINT` |
+ | `int` | `INT` |
+ | `bigint` | `BIGINT` |
+ | `real` | `FLOAT` |
+ | `float` | `DOUBLE` |
+ | `float(n)` if *n ≤ 24* | `FLOAT` |
+ | `float(n)` if *n ≥ 25* | `DOUBLE` |
+ | `binary` | `BLOB` |
+ | `varbinary` | `BLOB` |
+ | `char` | `TEXT` |
+ | `varchar` | `TEXT` |
+ | `nchar` | `TEXT` |
+ | `nvarchar` | `TEXT` |
+ | `ntext` | `TEXT` |
+ | `text` | `TEXT` |
+ | `date` | `DATE` |
+ | `time` | `TIME` |
+ | `datetime` | `TIMESTAMP` |
+ | `datetime2` | `TIMESTAMP` |
+ | `smalldatetime` | `TIMESTAMP` |
+ | `datetimeoffset` | `TIMESTAMPTZ` |
+
+
+ | **DynamoDB Data Type** | **ScalarDB Analytics Data Type** |
+ |:-------------------------|:---------------------------------|
+ | `Number` | `BYTE` |
+ | `Number` | `SMALLINT` |
+ | `Number` | `INT` |
+ | `Number` | `BIGINT` |
+ | `Number` | `FLOAT` |
+ | `Number` | `DOUBLE` |
+ | `Number` | `DECIMAL` |
+ | `String` | `TEXT` |
+ | `Binary` | `BLOB` |
+ | `Boolean` | `BOOLEAN` |
+
+:::warning
+
+It is important to ensure that the field values of `Number` types are parsable as a specified data type for ScalarDB Analytics. For example, if a column that corresponds to a `Number`-type field is specified as an `INT` type, its value must be an integer. If the value is not an integer, an error will occur when running a query.
+
+:::
+
+
+
## Query engine
diff --git a/docs/scalardb-analytics/run-analytical-queries.mdx b/docs/scalardb-analytics/run-analytical-queries.mdx
index 98d990cd..4f4b26aa 100644
--- a/docs/scalardb-analytics/run-analytical-queries.mdx
+++ b/docs/scalardb-analytics/run-analytical-queries.mdx
@@ -4,8 +4,8 @@ tags:
displayed_sidebar: docsEnglish
---
-import Tabs from "@theme/Tabs";
-import TabItem from "@theme/TabItem";
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
# Run Analytical Queries Through ScalarDB Analytics
@@ -19,74 +19,204 @@ This section describes the prerequisites, setting up ScalarDB Analytics in the S
### Prerequisites
-- **ScalarDB Analytics catalog server**: A running instance that manages catalog metadata and connects to your data sources. The server must be set up with at least one data source registered. For setup and data source registration instructions, see [Set up and administer the ScalarDB Analytics catalog server](./administration.mdx).
-- **Apache Spark**: A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html).
+ScalarDB Analytics works with Apache Spark 3.4 or later. If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html).
:::note
-Apache Spark are built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. You can refer to [Version compatibility](#version-compatibility) for more details.
+Apache Spark are built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. You can refer to [Version Compatibility](#version-compatibility) for more details.
:::
### Set up ScalarDB Analytics in the Spark configuration
-ScalarDB Analytics requires specific Spark configurations to integrate with the catalog server.
+The following sections describe all available configuration options for ScalarDB Analytics. These configurations control:
-#### Required Spark configurations
+- How ScalarDB Analytics integrates with Spark
+- How data sources are connected and accessed
+- How license information is provided
-To use ScalarDB Analytics with Spark, you need to configure:
+For example configurations in a practical scenario, see [the sample application configuration](../scalardb-samples/scalardb-analytics-spark-sample/README.mdx#scalardb-analytics-configuration).
-1. **ScalarDB Analytics package**: Add the JAR dependency that matches your Spark and Scala versions
-2. **Metering listener**: Register the listener to track resource usage for billing
-3. **Catalog registration**: Register a Spark catalog that connects to your ScalarDB Analytics server
+#### Spark plugin configurations
-When configuring Spark, you must specify a catalog name that matches the catalog created on your ScalarDB Analytics server. This ensures Spark can correctly access the data sources managed by that catalog.
+| Configuration Key | Required | Description |
+|:-----------------|:---------|:------------|
+| `spark.jars.packages` | No | A comma-separated list of Maven coordinates for the required dependencies. User need to include the ScalarDB Analytics package you are using, otherwise, specify it as the command line argument when running the Spark application. For details about the Maven coordinates of ScalarDB Analytics, refer to [Add ScalarDB Analytics dependency](#add-the-scalardb-analytics-dependency). |
+| `spark.sql.extensions` | Yes | Must be set to `com.scalar.db.analytics.spark.extension.ScalarDbAnalyticsExtensions`. |
+| `spark.sql.catalog.` | Yes | Must be set to `com.scalar.db.analytics.spark.ScalarDbAnalyticsCatalog`. |
-#### Example configuration
+You can specify any name for ``. Be sure to use the same catalog name throughout your configuration.
-Here's a complete example configuration:
+#### License configurations
-```conf
-# 1. ScalarDB Analytics package
-spark.jars.packages com.scalar-labs:scalardb-analytics-spark-all-_:
+| Configuration Key | Required | Description |
+| :--------------------------------------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------- |
+| `spark.sql.catalog..license.key` | Yes | JSON string of the license key for ScalarDB Analytics |
+| `spark.sql.catalog..license.cert_pem` | Yes | A string of PEM-encoded certificate of ScalarDB Analytics license. Either `cert_pem` or `cert_path` must be set. |
+| `spark.sql.catalog..license.cert_path` | Yes | A path to the PEM-encoded certificate of ScalarDB Analytics license. Either `cert_pem` or `cert_path` must be set. |
-# 2. Metering listener
-spark.extraListeners com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener
+#### Data source configurations
-# 3. Catalog registration
-spark.sql.catalog.myanalytics com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog
-spark.sql.catalog.myanalytics.server.host analytics-server.example.com
-spark.sql.catalog.myanalytics.server.catalog.port 11051
-spark.sql.catalog.myanalytics.server.metering.port 11052
-```
+ScalarDB Analytics supports multiple types of data sources. Each type requires specific configuration parameters:
-Replace the placeholders:
+
+
-- ``: Your Spark version (e.g., `3.5` or `3.4`)
-- ``: Your Scala version (e.g., `2.13` or `2.12`)
-- ``: The ScalarDB Analytics version (e.g., `3.16.0`)
+:::note
-In this example:
+ScalarDB Analytics supports ScalarDB as a data source. This table describes how to configure ScalarDB as a data source.
-- The catalog name `myanalytics` must match a catalog that exists on your ScalarDB Analytics server
-- The ScalarDB Analytics server is running at `analytics-server.example.com`
-- Tables will be accessed using the format: `myanalytics...`
+:::
-:::important
+| Configuration Key | Required | Description |
+| :---------------------------------------------------------------------------- | :------- | :---------------------------------------------- |
+| `spark.sql.catalog..data_source..type` | Yes | Always set to `scalardb` |
+| `spark.sql.catalog..data_source..config_path` | Yes | The path to the configuration file for ScalarDB |
-The catalog name in your Spark configuration must match the name of a catalog created on the ScalarDB Analytics server using the CLI. For example, if you created a catalog named `production` on the server, you must use `production` as the catalog name in your Spark configuration properties (e.g., `spark.sql.catalog.production`, `spark.sql.catalog.production.server.host`, etc.).
+:::tip
+
+You can use an arbitrary name for ``.
:::
-:::note
+
+
+
+| Configuration Key | Required | Description |
+| :------------------------------------------------------------------------- | :------- | :------------------------------------- |
+| `spark.sql.catalog..data_source..type` | Yes | Always set to `mysql` |
+| `spark.sql.catalog..data_source..host` | Yes | The host name of the MySQL server |
+| `spark.sql.catalog..data_source..port` | Yes | The port number of the MySQL server |
+| `spark.sql.catalog..data_source..username` | Yes | The username of the MySQL server |
+| `spark.sql.catalog..data_source..password` | Yes | The password of the MySQL server |
+| `spark.sql.catalog..data_source..database` | No | The name of the database to connect to |
+
+:::tip
+
+You can use an arbitrary name for ``.
+
+:::
+
+
+
+
+| Configuration Key | Required | Description |
+| :------------------------------------------------------------------------- | :------- | :--------------------------------------- |
+| `spark.sql.catalog..data_source..type` | Yes | Always set to `postgresql` or `postgres` |
+| `spark.sql.catalog..data_source..host` | Yes | The host name of the PostgreSQL server |
+| `spark.sql.catalog..data_source..port` | Yes | The port number of the PostgreSQL server |
+| `spark.sql.catalog..data_source..username` | Yes | The username of the PostgreSQL server |
+| `spark.sql.catalog..data_source..password` | Yes | The password of the PostgreSQL server |
+| `spark.sql.catalog..data_source..database` | Yes | The name of the database to connect to |
+
+:::tip
+
+You can use an arbitrary name for ``.
+
+:::
+
+
+
+
+| Configuration Key | Required | Description |
+| :----------------------------------------------------------------------------- | :------- | :------------------------------------ |
+| `spark.sql.catalog..data_source..type` | Yes | Always set to `oracle` |
+| `spark.sql.catalog..data_source..host` | Yes | The host name of the Oracle server |
+| `spark.sql.catalog..data_source..port` | Yes | The port number of the Oracle server |
+| `spark.sql.catalog..data_source..username` | Yes | The username of the Oracle server |
+| `spark.sql.catalog..data_source..password` | Yes | The password of the Oracle server |
+| `spark.sql.catalog..data_source..service_name` | Yes | The service name of the Oracle server |
+
+:::tip
+
+You can use an arbitrary name for ``.
+
+:::
+
+
+
+
+| Configuration Key | Required | Description |
+| :------------------------------------------------------------------------- | :------- | :----------------------------------------------------------------------------------------------------- |
+| `spark.sql.catalog..data_source..type` | Yes | Always set to `sqlserver` or `mssql` |
+| `spark.sql.catalog..data_source..host` | Yes | The host name of the SQL Server server |
+| `spark.sql.catalog..data_source..port` | Yes | The port number of the SQL Server server |
+| `spark.sql.catalog..data_source..username` | Yes | The username of the SQL Server server |
+| `spark.sql.catalog..data_source..password` | Yes | The password of the SQL Server server |
+| `spark.sql.catalog..data_source..database` | No | The name of the database to connect to |
+| `spark.sql.catalog..data_source..secure` | No | Whether to use a secure connection to the SQL Server server. Set to `true` to use a secure connection. |
+
+:::tip
+
+You can use an arbitrary name for ``.
+
+:::
+
+
+
+
+| Configuration Key | Required | Description |
+|:---------------------------------------------------------------------------|:------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `spark.sql.catalog..data_source..type` | Yes | Always set to `dynamodb` |
+| `spark.sql.catalog..data_source..region` | Either `region` or `endpoint` must be set | The AWS region of the DynamoDB instance |
+| `spark.sql.catalog..data_source..endpoint` | Either `region` or `endpoint` must be set | The AWS endpoint of the DynamoDB instance |
+| `spark.sql.catalog..data_source..schema` | Yes | A JSON object representing the schema of the catalog. For details on the format, see [Catalog-level mappings](./design.mdx#catalog-level-mappings). |
-Data source configurations are managed by the catalog server. For information on configuring data sources in the catalog server, see [Set up and administer the ScalarDB Analytics catalog server](./administration.mdx#configure-data-sources).
+
+:::tip
+
+You can use an arbitrary name for ``.
:::
-### Build configuration for Spark applications
+
+
+
+#### Example configuration
+
+Below is an example configuration for ScalarDB Analytics that demonstrates how to set up a catalog named `scalardb` with multiple data sources:
+
+```conf
+# Spark plugin configurations
+spark.jars.packages com.scalar-labs:scalardb-analytics-spark-all-_:
+spark.sql.extensions com.scalar.db.analytics.spark.extension.ScalarDbAnalyticsExtensions
+spark.sql.catalog.scalardb com.scalar.db.analytics.spark.ScalarDbAnalyticsCatalog
+
+# License configurations
+spark.sql.catalog.scalardb.license.key
+spark.sql.catalog.scalardb.license.cert_pem
+
+# Data source configurations
+spark.sql.catalog.scalardb.data_source.scalardb.type scalardb
+spark.sql.catalog.scalardb.data_source.scalardb.config_path /path/to/scalardb.properties
+
+spark.sql.catalog.scalardb.data_source.mysql_source.type mysql
+spark.sql.catalog.scalardb.data_source.mysql_source.host localhost
+spark.sql.catalog.scalardb.data_source.mysql_source.port 3306
+spark.sql.catalog.scalardb.data_source.mysql_source.username root
+spark.sql.catalog.scalardb.data_source.mysql_source.password password
+spark.sql.catalog.scalardb.data_source.mysql_source.database mydb
+```
+
+The following describes what you should change the content in the angle brackets to:
+
+- ``: The license key for ScalarDB Analytics
+- ``: The PEM-encoded certificate of ScalarDB Analytics license
+- ``: The major and minor version of Spark you are using (such as 3.4)
+- ``: The major and minor version of Scala that matches your Spark installation (such as 2.12 or 2.13)
+- ``: The version of ScalarDB Analytics
+
+### Add the ScalarDB Analytics dependency
+
+ScalarDB Analytics is hosted in the Maven Central Repository. The name of the package is `scalardb-analytics-spark-all-_:`, where:
+
+- ``: The major and minor version of Spark you are using (such as 3.4)
+- ``: The major and minor version of Scala that matches your Spark installation (such as 2.12 or 2.13)
+- ``: The version of ScalarDB Analytics
+
+For details about version compatibility, refer to [Version Compatibility](#version-compatibility).
-When developing Spark applications that use ScalarDB Analytics, you can add the dependency to your build configuration. For example, with Gradle:
+You can add this dependency to your project by configuring the build settings of your project. For example, if you are using Gradle, you can add the following to your `build.gradle` file:
```groovy
dependencies {
@@ -96,7 +226,7 @@ dependencies {
:::note
-If you bundle your application in a fat JAR using plugins like Gradle Shadow or Maven Shade, exclude ScalarDB Analytics from the fat JAR by using configurations such as `provided` or `shadow`.
+If you want bundle your application in a single fat JAR file by using plugins like Gradle Shadow plugin or Maven Shade plugin, you need to exclude ScalarDB Analytics from the fat JAR file by choosing the appropriate configuration, such as `provided` or `shadow`, depending on the plugin you are using.
:::
@@ -116,10 +246,10 @@ Depending on your environment, you may not be able to use all the methods mentio
:::
-With all these methods, you can refer to tables in ScalarDB Analytics using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, refer to [Catalog metadata reference](./administration.mdx#catalog-metadata-reference).
+With all these methods, you can refer to tables in ScalarDB Analytics using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, refer to [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source).
-
+
You can use a commonly used `SparkSession` class for ScalarDB Analytics. Additionally, you can use any type of cluster deployment that Spark supports, such as YARN, Kubernetes, standalone, or local mode.
@@ -133,7 +263,7 @@ dependencies {
}
```
-Below is an example of a Spark driver application:
+Below is an example of a Spark Driver application:
```java
import org.apache.spark.sql.SparkSession;
@@ -170,7 +300,7 @@ You can also use other CLI tools that Spark provides, such as `spark-sql` and `s
-You can use [Spark Connect](https://spark.apache.org/spark-connect/) to interact with ScalarDB Analytics. By using Spark Connect, you can access a remote Spark cluster and read data in the same way as a Spark driver application. The following briefly describes how to use Spark Connect.
+You can use [Spark Connect](https://spark.apache.org/spark-connect/) to interact with ScalarDB Analytics. By using Spark Connect, you can access a remote Spark cluster and read data in the same way as a Spark Driver application. The following briefly describes how to use Spark Connect.
First, you need to start a Spark Connect server in the remote Spark cluster by running the following command:
@@ -237,9 +367,13 @@ ScalarDB Analytics manages its own catalog, containing data sources, namespaces,
For details about how information in the raw data sources is mapped to the ScalarDB Analytics catalog, refer to [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source).
-### Catalog structure mapping
+### Catalog level mapping
+
+Each catalog level object in the ScalarDB Analytics catalog is mapped to a Spark catalog. The following table shows how the catalog levels are mapped:
-ScalarDB Analytics maps catalog structure from data sources to Spark catalogs. Tables from data sources in the ScalarDB Analytics catalog are mapped to Spark tables using the following format:
+#### Data source tables
+
+Tables from data sources in the ScalarDB Analytics catalog are mapped to Spark tables. The following format is used to represent the identity of the Spark tables that correspond to ScalarDB Analytics tables:
```console
...
@@ -254,12 +388,39 @@ The following describes what you should change the content in the angle brackets
For example, if you have a ScalarDB catalog named `my_catalog` that contains a data source named `my_data_source` and a schema named `my_schema`, you can refer to the table named `my_table` in that schema as `my_catalog.my_data_source.my_schema.my_table`.
+#### Views
+
+Views in ScalarDB Analytics are provided as tables in the Spark catalog, not views. The following format is used to represent the identity of the Spark tables that correspond to ScalarDB Analytics views:
+
+```console
+.view..
+```
+
+The following describes what you should change the content in the angle brackets to:
+
+- ``: The name of the catalog.
+- ``: The names of the view namespaces. If the view namespace names are multi-level, they are concatenated with a dot (`.`) as the separator.
+- ``: The name of the view.
+
+For example, if you have a ScalarDB catalog named `my_catalog` and a view namespace named `my_view_namespace`, you can refer to the view named `my_view` in that namespace as `my_catalog.view.my_view_namespace.my_view`.
+
+:::note
+
+`view` is prefixed to avoid conflicts with the data source table identifier.
+
+:::
+
+##### WAL-interpreted views
+
+As explained in [ScalarDB Analytics Design](./design.mdx), ScalarDB Analytics provides a functionality called WAL-interpreted views, which is a special type of views. These views are automatically created for tables of ScalarDB data sources to provide a user-friendly view of the data by interpreting WAL-metadata in the tables.
+
+Since the data source name and the namespace names of the original ScalarDB tables are used as the view namespace names for WAL-interpreted views, if you have a ScalarDB table named `my_table` in a namespace named `my_namespace` of a data source named `my_data_source`, you can refer to the WAL-interpreted view of the table as `my_catalog.view.my_data_source.my_namespace.my_table`.
### Data-type mapping
ScalarDB Analytics maps data types in its catalog to Spark data types. The following table shows how the data types are mapped:
-| ScalarDB data type | Spark data type |
+| ScalarDB Data Type | Spark Data Type |
| :----------------- | :----------------- |
| `BYTE` | `Byte` |
| `SMALLINT` | `Short` |
@@ -287,6 +448,6 @@ Regarding the Java version, ScalarDB Analytics supports Java 8 or later.
The following is a list of Spark and Scalar versions supported by each version of ScalarDB Analytics.
| ScalarDB Analytics Version | ScalarDB Version | Spark Versions Supported | Scala Versions Supported | Minimum Java Version |
-| :------------------------- | :--------------- | :----------------------- | :----------------------- | :------------------- |
+|:---------------------------|:-----------------|:-------------------------|:-------------------------|:---------------------|
| 3.16 | 3.16 | 3.5, 3.4 | 2.13, 2.12 | 8 |
| 3.15 | 3.15 | 3.5, 3.4 | 2.13, 2.12 | 8 |