diff --git a/docs/scalardb-analytics/configuration.mdx b/docs/scalardb-analytics/configuration.mdx deleted file mode 100644 index 04245224..00000000 --- a/docs/scalardb-analytics/configuration.mdx +++ /dev/null @@ -1,242 +0,0 @@ ---- -tags: - - Enterprise Option -displayed_sidebar: docsEnglish ---- - -# Configuration reference - -This page provides a comprehensive reference for configuring all components of ScalarDB Analytics. - -## Overview - -ScalarDB Analytics consists of three main components that require configuration: - -1. **ScalarDB Analytics server** - The server that hosts the catalog and metering services -2. **CLI client** - The command-line interface for managing catalogs and data sources -3. **Spark integration** - Configuration for using ScalarDB Analytics with Apache Spark - -## ScalarDB Analytics server configuration - -The server is configured using a standard Java properties file (e.g., `scalardb-analytics-server.properties`) that defines database connections, network settings, licensing, and optional features. - -### Configuration properties - -#### Metadata database configuration - -The server requires a metadata database to store catalog information. - -| Property | Required | Description | Default | Example | -| ---------------------------------------- | -------- | ------------------------------------ | ------- | ----------------------------------------------------- | -| `scalar.db.analytics.server.db.url` | Yes | JDBC URL for the metadata database | - | `jdbc:postgresql://localhost:5432/scalardb_analytics` | -| `scalar.db.analytics.server.db.username` | Yes | Database user for authentication | - | `analytics_user` | -| `scalar.db.analytics.server.db.password` | Yes | Database password for authentication | - | `your_secure_password` | - -#### gRPC server configuration - -Configure the ports for the catalog and metering services. - -| Property | Required | Default | Description | Example | -| ------------------------------------------ | -------- | ------- | ----------------------------- | ------- | -| `scalar.db.analytics.server.catalog.port` | No | `11051` | Port for the catalog service | `11051` | -| `scalar.db.analytics.server.metering.port` | No | `11052` | Port for the metering service | `11052` | - -#### TLS configuration - -Enable TLS/SSL for secure communication. - -| Property | Required | Default | Description | Example | -| ------------------------------------------------- | -------- | ------- | ----------------------------------------- | --------------------- | -| `scalar.db.analytics.server.tls.enabled` | No | `false` | Enable TLS/SSL for gRPC endpoints | `true` | -| `scalar.db.analytics.server.tls.cert_chain_path` | Yes\* | - | Path to the server certificate chain file | `/path/to/server.crt` | -| `scalar.db.analytics.server.tls.private_key_path` | Yes\* | - | Path to the server private key file | `/path/to/server.key` | - -\* Required when `tls.enabled` is `true` - -#### License configuration - -Configure your ScalarDB Analytics license. - -| Property | Required | Description | Default | Example | -| -------------------------------------------------------------- | -------- | ---------------------------------------------- | ------- | ------------------------------ | -| `scalar.db.analytics.server.licensing.license_key` | Yes | Your ScalarDB Analytics license key | - | Contact Scalar for license | -| `scalar.db.analytics.server.licensing.license_check_cert_pem` | Yes\* | License verification certificate as PEM string | - | Contact Scalar for certificate | -| `scalar.db.analytics.server.licensing.license_check_cert_path` | Yes\* | Path to license verification certificate file | - | `/path/to/cert.pem` | - -\* Either `license_check_cert_pem` or `license_check_cert_path` must be specified - -#### Metering storage configuration - -Configure storage for metering data. - -| Property | Required | Default | Description | Example | -| ------------------------------------------------------------- | -------- | ---------- | ------------------------------------------------------------------------------------------------ | ------------------------------------------ | -| `scalar.db.analytics.server.metering.storage.provider` | Yes | - | Storage provider for metering data (`filesystem`, `aws-s3`, `azureblob`, `google-cloud-storage`) | `filesystem` | -| `scalar.db.analytics.server.metering.storage.containerName` | No | `metering` | Container/bucket name for cloud storage | `my-metering-bucket` | -| `scalar.db.analytics.server.metering.storage.path` | Yes\* | - | Local directory path (for `filesystem` provider only) | `/var/scalardb-analytics/metering` | -| `scalar.db.analytics.server.metering.storage.accessKeyId` | Yes\*\* | - | Access key ID for cloud storage providers | `AKIAIOSFODNN7EXAMPLE` | -| `scalar.db.analytics.server.metering.storage.secretAccessKey` | Yes\*\* | - | Secret access key for cloud storage providers | `wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY` | -| `scalar.db.analytics.server.metering.storage.prefix` | No | - | Optional prefix for all storage paths | `production/` | - -\* Required when provider is `filesystem` -\*\* Required for cloud storage providers (`aws-s3`, `azureblob`, `google-cloud-storage`) - -## CLI client configuration - -The CLI client requires connection settings to communicate with the ScalarDB Analytics server using a Java properties file (e.g., `client.properties`). - -### Configuration properties - -#### Server connection configuration - -| Property | Required | Default | Description | Example | -| ------------------------------------------------- | -------- | ------- | ------------------------------------------------------- | ----------------------- | -| `scalar.db.analytics.client.server.host` | Yes | - | Hostname or IP address of the ScalarDB Analytics server | `analytics.example.com` | -| `scalar.db.analytics.client.server.catalog.port` | No | `11051` | Port number for the catalog service | `11051` | -| `scalar.db.analytics.client.server.metering.port` | No | `11052` | Port number for the metering service | `11052` | - -#### TLS configuration - -| Property | Required | Default | Description | Example | -| ---------------------------------------------------------- | -------- | ------- | ----------------------------------------------------------------------- | ------------------- | -| `scalar.db.analytics.client.server.tls.enabled` | No | `false` | Enable TLS/SSL for server connections | `true` | -| `scalar.db.analytics.client.server.tls.ca_root_cert_path` | Yes\* | - | Path to the CA certificate file for verifying server certificates | `/path/to/cert.pem` | -| `scalar.db.analytics.client.server.tls.override_authority` | No | - | Override the server authority for TLS verification (useful for testing) | `test.example.com` | - -\* Required when `tls.enabled` is `true` - -## Spark integration configuration - -To use ScalarDB Analytics with Apache Spark, configure your Spark application by adding the necessary settings to your Spark configuration file (`spark-defaults.conf`). - -### Configuration properties - -#### Core Spark configuration - -| Property | Required | Description | Default | Example | -| ---------------------- | -------- | ----------------------------------------------------- | ------- | ------------------------------------------------------------------ | -| `spark.jars.packages` | Yes | Maven coordinates for ScalarDB Analytics dependencies | - | `com.scalar-labs:scalardb-analytics-spark-all-3.5_2.12:3.16.2` | -| `spark.extraListeners` | Yes | Register the ScalarDB Analytics metering listener | - | `com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener` | - -For `spark.jars.packages`, replace: - -- `` with your Spark version (e.g., `3.5`) -- `` with your Scala version (e.g., `2.12`) -- `` with the ScalarDB Analytics version (e.g., `3.16.2`) - -#### Catalog configuration - -| Property | Required | Description | Default | Value | -| ---------------------------------- | -------- | ------------------------------------------------------ | ------- | ---------------------------------------------------------------- | -| `spark.sql.catalog.` | Yes | Register the ScalarDB Analytics catalog implementation | - | `com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog` | - -#### Server connection settings - -| Property | Required | Default | Description | Example | -| ------------------------------------------------------- | -------- | ------- | ------------------------------------------------------- | ----------- | -| `spark.sql.catalog..server.host` | Yes | - | Hostname or IP address of the ScalarDB Analytics server | `localhost` | -| `spark.sql.catalog..server.catalog.port` | No | `11051` | Port number for the catalog service | `11051` | -| `spark.sql.catalog..server.metering.port` | No | `11052` | Port number for the metering service | `11052` | - -#### TLS/SSL settings - -| Property | Required | Default | Description | Example | -| ---------------------------------------------------------------- | -------- | ------- | ----------------------------------------------------------------- | ------------------- | -| `spark.sql.catalog..server.tls.enabled` | No | `false` | Enable TLS/SSL for server connections | `true` | -| `spark.sql.catalog..server.tls.ca_root_cert_path` | Yes\* | - | Path to the CA certificate file for verifying server certificates | `/path/to/cert.pem` | -| `spark.sql.catalog..server.tls.override_authority` | No | - | Override the server authority for TLS verification | `test.example.com` | - -\* Required when `tls.enabled` is `true` - -Replace `` with your chosen catalog name (e.g., `analytics`). - -## Configuration examples - -### Basic development configuration - -#### Server configuration (`scalardb-analytics-server.properties`) - -```properties -# Metadata database -scalar.db.analytics.server.db.url=jdbc:postgresql://localhost:5432/scalardb_analytics -scalar.db.analytics.server.db.username=dev_user -scalar.db.analytics.server.db.password=dev_password - -# License -scalar.db.analytics.server.licensing.license_key=YOUR_DEV_LICENSE_KEY -scalar.db.analytics.server.licensing.license_check_cert_path=/path/to/license_cert.pem - -# Metering storage (filesystem for development) -scalar.db.analytics.server.metering.storage.provider=filesystem -scalar.db.analytics.server.metering.storage.path=/tmp/scalardb-analytics-metering -``` - -#### Client configuration (`client.properties`) - -```properties -scalar.db.analytics.client.server.host=localhost -``` - -#### Spark configuration (`spark-defaults.conf`) - -```properties -spark.jars.packages com.scalar-labs:scalardb-analytics-spark-all-3.5_2.12:3.16.2 -spark.extraListeners com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener -spark.sql.catalog.analytics com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog -spark.sql.catalog.analytics.server.host localhost -``` - -### Production configuration with TLS - -#### Server configuration (`scalardb-analytics-server.properties`) - -```properties -# Metadata database -scalar.db.analytics.server.db.url=jdbc:postgresql://db.internal:5432/scalardb_analytics_prod -scalar.db.analytics.server.db.username=analytics_prod -scalar.db.analytics.server.db.password=your_secure_password - -# gRPC ports -scalar.db.analytics.server.catalog.port=11051 -scalar.db.analytics.server.metering.port=11052 - -# TLS -scalar.db.analytics.server.tls.enabled=true -scalar.db.analytics.server.tls.cert_chain_path=/path/to/server.crt -scalar.db.analytics.server.tls.private_key_path=/path/to/server.key - -# License -scalar.db.analytics.server.licensing.license_key=YOUR_LICENSE_KEY -scalar.db.analytics.server.licensing.license_check_cert_pem=-----BEGIN CERTIFICATE-----\nMIID...certificate content...\n-----END CERTIFICATE----- - -# Metering storage (S3) -scalar.db.analytics.server.metering.storage.provider=aws-s3 -scalar.db.analytics.server.metering.storage.containerName=analytics-metering -scalar.db.analytics.server.metering.storage.accessKeyId=AKIAIOSFODNN7EXAMPLE -scalar.db.analytics.server.metering.storage.secretAccessKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY -scalar.db.analytics.server.metering.storage.prefix=prod/ -``` - -#### Client configuration (`client.properties`) - -```properties -scalar.db.analytics.client.server.host=analytics.example.com -scalar.db.analytics.client.server.tls.enabled=true -scalar.db.analytics.client.server.tls.ca_root_cert_path=/path/to/cert.pem -``` - -#### Spark configuration (`spark-defaults.conf`) - -```properties -spark.jars.packages com.scalar-labs:scalardb-analytics-spark-all-3.5_2.12:3.16.2 -spark.extraListeners com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener -spark.sql.catalog.analytics com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog -spark.sql.catalog.analytics.server.host analytics.example.com -spark.sql.catalog.analytics.server.tls.enabled true -spark.sql.catalog.analytics.server.tls.ca_root_cert_path /path/to/cert.pem -``` - -## Next steps - -- [Run analytical queries](run-analytical-queries.mdx) - Start running queries with your configuration -- [Deployment guide](deployment.mdx) - Deploy ScalarDB Analytics in production diff --git a/docs/scalardb-analytics/configurations.mdx b/docs/scalardb-analytics/configurations.mdx new file mode 100644 index 00000000..59bdb01f --- /dev/null +++ b/docs/scalardb-analytics/configurations.mdx @@ -0,0 +1,349 @@ +--- +tags: + - Enterprise Option +displayed_sidebar: docsEnglish +--- + +# ScalarDB Analytics Configurations + +This page provides a comprehensive reference for configuring all components of ScalarDB Analytics. + +## Overview + +ScalarDB Analytics consists of three main components that require configuration: + +1. **ScalarDB Analytics server** - The server that hosts the catalog information and metering services +2. **CLI client** - The command-line interface for managing catalogs and data sources +3. **Spark integration** - Configuration for using ScalarDB Analytics with Apache Spark + +## ScalarDB Analytics server configuration + +The server is configured using a standard Java properties file (for example, `scalardb-analytics-server.properties`) that defines database connections, network settings, licensing, and optional features. + +### Metadata database configurations + +Configure the metadata database that stores catalog information. + +#### `db.url` + +- **Field:** `scalar.db.analytics.server.db.url` +- **Description:** The JDBC URL for the metadata database used by ScalarDB Analytics. + +#### `db.username` + +- **Field:** `scalar.db.analytics.server.db.username` +- **Description:** The username for connecting to the metadata database. + +#### `db.password` + +- **Field:** `scalar.db.analytics.server.db.password` +- **Description:** The password for the metadata database user. + +### Server network configuration + +Configure network settings including service ports and TLS/SSL encryption. + +#### `catalog.port` + +- **Field:** `scalar.db.analytics.server.catalog.port` +- **Description:** Port for the catalog service. +- **Default value:** `11051` + +#### `metering.port` + +- **Field:** `scalar.db.analytics.server.metering.port` +- **Description:** Port for the metering service. +- **Default value:** `11052` + +#### `tls.enabled` + +- **Field:** `scalar.db.analytics.server.tls.enabled` +- **Description:** Enable TLS/SSL for secure communication. +- **Default value:** `false` + +#### `tls.cert_chain_path` + +- **Field:** `scalar.db.analytics.server.tls.cert_chain_path` +- **Description:** Path to the server certificate chain file. Required when `tls.enabled` is `true`. + +#### `tls.private_key_path` + +- **Field:** `scalar.db.analytics.server.tls.private_key_path` +- **Description:** Path to the server private key file. Required when `tls.enabled` is `true`. + +### License configuration + +Configure your ScalarDB Analytics license. + +#### `licensing.license_key` + +- **Field:** `scalar.db.analytics.server.licensing.license_key` +- **Description:** Your ScalarDB Analytics license key. + +#### `licensing.license_check_cert_pem` + +- **Field:** `scalar.db.analytics.server.licensing.license_check_cert_pem` +- **Description:** License verification certificate as PEM string. Either this or `license_check_cert_path` must be specified. + +#### `licensing.license_check_cert_path` + +- **Field:** `scalar.db.analytics.server.licensing.license_check_cert_path` +- **Description:** Path to license verification certificate file. Either this or `license_check_cert_pem` must be specified. + +### Metering storage configuration + +Configure storage for metering data. + +#### `metering.storage.provider` + +- **Field:** `scalar.db.analytics.server.metering.storage.provider` +- **Description:** Storage provider for metering data (`filesystem`, `aws-s3`, `azureblob`, `google-cloud-storage`). + +#### `metering.storage.containerName` + +- **Field:** `scalar.db.analytics.server.metering.storage.containerName` +- **Description:** Container/bucket name for cloud storage. +- **Default value:** `metering` + +#### `metering.storage.path` + +- **Field:** `scalar.db.analytics.server.metering.storage.path` +- **Description:** Local directory path. Required when provider is `filesystem`. + +#### `metering.storage.accessKeyId` + +- **Field:** `scalar.db.analytics.server.metering.storage.accessKeyId` +- **Description:** Access key ID for cloud storage providers. Required for `aws-s3`, `azureblob`, and `google-cloud-storage`. + +#### `metering.storage.secretAccessKey` + +- **Field:** `scalar.db.analytics.server.metering.storage.secretAccessKey` +- **Description:** Secret access key for cloud storage providers. Required for `aws-s3`, `azureblob`, and `google-cloud-storage`. + +#### `metering.storage.prefix` + +- **Field:** `scalar.db.analytics.server.metering.storage.prefix` +- **Description:** Optional prefix for all storage paths. + +## CLI client configuration + +The CLI client requires connection settings to communicate with the ScalarDB Analytics server using a Java properties file (for example, `client.properties`). + +## Configuration properties + +This section describes the configuration properties. + +### Server connection configuration + +The following is a list of configurations for connecting to the server. + +#### `server.host` + +- **Field:** `scalar.db.analytics.client.server.host` +- **Description:** Hostname or IP address of the ScalarDB Analytics server. + +#### `server.catalog.port` + +- **Field:** `scalar.db.analytics.client.server.catalog.port` +- **Description:** Port number for the catalog service. +- **Default value:** `11051` + +#### `server.metering.port` + +- **Field:** `scalar.db.analytics.client.server.metering.port` +- **Description:** Port number for the metering service. +- **Default value:** `11052` + +### TLS configuration + +The following is a list of configurations for TLS. + +#### `server.tls.enabled` + +- **Field:** `scalar.db.analytics.client.server.tls.enabled` +- **Description:** Enable TLS/SSL for server connections. +- **Default value:** `false` + +#### `server.tls.ca_root_cert_path` + +- **Field:** `scalar.db.analytics.client.server.tls.ca_root_cert_path` +- **Description:** Path to the CA certificate file for verifying server certificates. Required when `tls.enabled` is `true`. + +#### `server.tls.override_authority` + +- **Field:** `scalar.db.analytics.client.server.tls.override_authority` +- **Description:** Override the server authority for TLS verification (useful for testing). + +## Spark integration configuration + +To use ScalarDB Analytics with Apache Spark, configure your Spark application by adding the necessary settings to your Spark configuration file (`spark-defaults.conf`). + +### Spark Core configuration + +The following is a list of configurations for Spark Core. + +#### `spark.jars.packages` + +- **Field:** `spark.jars.packages` +- **Description:** Maven coordinates for ScalarDB Analytics dependencies. + +#### `spark.extraListeners` + +- **Field:** `spark.extraListeners` +- **Description:** Register the ScalarDB Analytics metering listener. + +### Catalog configuration + +The following is a list of configurations for the catalog. + +#### `spark.sql.catalog.` + +- **Field:** `spark.sql.catalog.` +- **Description:** Register the ScalarDB Analytics catalog implementation. Replace `` with the exact name of the catalog created in ScalarDB Analytics server. Use `com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog` as the value. + +:::important + +The `` must match the catalog name created in ScalarDB Analytics server using the CLI. For example, if you created a catalog named `production` in the server, use `spark.sql.catalog.production`. + +::: + +### Server connection configurations + +The following is a list of configurations for the server connection. + +#### `spark.sql.catalog..server.host` + +- **Field:** `spark.sql.catalog..server.host` +- **Description:** Hostname or IP address of the ScalarDB Analytics server. + +#### `spark.sql.catalog..server.catalog.port` + +- **Field:** `spark.sql.catalog..server.catalog.port` +- **Description:** Port number for the catalog service. +- **Default value:** `11051` + +#### `spark.sql.catalog..server.metering.port` + +- **Field:** `spark.sql.catalog..server.metering.port` +- **Description:** Port number for the metering service. +- **Default value:** `11052` + +### TLS/SSL configurations + +The following is a list of configurations for TLS/SSL. + +#### `spark.sql.catalog..server.tls.enabled` + +- **Field:** `spark.sql.catalog..server.tls.enabled` +- **Description:** Enable TLS/SSL for server connections. +- **Default value:** `false` + +#### `spark.sql.catalog..server.tls.ca_root_cert_path` + +- **Field:** `spark.sql.catalog..server.tls.ca_root_cert_path` +- **Description:** Path to the CA certificate file for verifying server certificates. Required when `tls.enabled` is `true`. + +#### `spark.sql.catalog..server.tls.override_authority` + +- **Field:** `spark.sql.catalog..server.tls.override_authority` +- **Description:** Override the server authority for TLS verification. + +Replace `` with your chosen catalog name (for example, `analytics`). + +## Configuration examples + +This section provides some configuration examples. + +### Basic development configuration + +The following are examples of configurations for the server, CLI client, and Spark. + +#### Server configuration (`scalardb-analytics-server.properties`) + +```properties +# Metadata database +scalar.db.analytics.server.db.url=jdbc:postgresql://localhost:5432/scalardb_analytics +scalar.db.analytics.server.db.username=dev_user +scalar.db.analytics.server.db.password=dev_password + +# License +scalar.db.analytics.server.licensing.license_key=YOUR_DEV_LICENSE_KEY +scalar.db.analytics.server.licensing.license_check_cert_path=/path/to/license_cert.pem + +# Metering storage (filesystem for development) +scalar.db.analytics.server.metering.storage.provider=filesystem +scalar.db.analytics.server.metering.storage.path=/tmp/scalardb-analytics-metering +``` + +#### CLI client configuration (`client.properties`) + +```properties +scalar.db.analytics.client.server.host=localhost +``` + +#### Spark configuration (`spark-defaults.conf`) + +```properties +spark.jars.packages com.scalar-labs:scalardb-analytics-spark-all-3.5_2.12:3.16.2 +spark.extraListeners com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener +spark.sql.catalog.analytics com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog +spark.sql.catalog.analytics.server.host localhost +``` + +### Production configuration with TLS + +The following are examples of configurations for TLS, CLI client, and Spark in production environments. + +#### Server configuration (`scalardb-analytics-server.properties`) + +```properties +# Metadata database +scalar.db.analytics.server.db.url=jdbc:postgresql://db.internal:5432/scalardb_analytics_prod +scalar.db.analytics.server.db.username=analytics_prod +scalar.db.analytics.server.db.password=your_secure_password + +# gRPC ports +scalar.db.analytics.server.catalog.port=11051 +scalar.db.analytics.server.metering.port=11052 + +# TLS +scalar.db.analytics.server.tls.enabled=true +scalar.db.analytics.server.tls.cert_chain_path=/path/to/server.crt +scalar.db.analytics.server.tls.private_key_path=/path/to/server.key + +# License +scalar.db.analytics.server.licensing.license_key=YOUR_LICENSE_KEY +scalar.db.analytics.server.licensing.license_check_cert_pem=-----BEGIN CERTIFICATE-----\nMIID...certificate content...\n-----END CERTIFICATE----- + +# Metering storage (S3) +scalar.db.analytics.server.metering.storage.provider=aws-s3 +scalar.db.analytics.server.metering.storage.containerName=analytics-metering +scalar.db.analytics.server.metering.storage.accessKeyId=AKIAIOSFODNN7EXAMPLE +scalar.db.analytics.server.metering.storage.secretAccessKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY +scalar.db.analytics.server.metering.storage.prefix=prod/ +``` + +#### CLI Client configuration (`client.properties`) + +```properties +scalar.db.analytics.client.server.host=analytics.example.com +scalar.db.analytics.client.server.tls.enabled=true +scalar.db.analytics.client.server.tls.ca_root_cert_path=/path/to/cert.pem +``` + +#### Spark configuration (`spark-defaults.conf`) + +```properties +spark.jars.packages com.scalar-labs:scalardb-analytics-spark-all-3.5_2.12:3.16.2 +spark.extraListeners com.scalar.db.analytics.spark.metering.ScalarDbAnalyticsListener +spark.sql.catalog.analytics com.scalar.db.analytics.spark.catalog.ScalarDBAnalyticsCatalog +spark.sql.catalog.analytics.server.host analytics.example.com +spark.sql.catalog.analytics.server.tls.enabled true +spark.sql.catalog.analytics.server.tls.ca_root_cert_path /path/to/cert.pem +``` + +## Next steps + +- [Catalog management](catalog-management.mdx) - Learn how to manage catalogs and data sources +- [Run analytical queries](run-analytical-queries.mdx) - Start running queries with your configuration +- [Deployment guide](deployment.mdx) - Deploy ScalarDB Analytics in production diff --git a/docs/scalardb-analytics/create-scalardb-analytics-catalog.mdx b/docs/scalardb-analytics/create-scalardb-analytics-catalog.mdx new file mode 100644 index 00000000..35b9a6fd --- /dev/null +++ b/docs/scalardb-analytics/create-scalardb-analytics-catalog.mdx @@ -0,0 +1,247 @@ +--- +tags: + - Enterprise Option +displayed_sidebar: docsEnglish +--- + +# Create a ScalarDB Analytics Catalog + +import WarningLicenseKeyContact from "/src/components/en-us/_warning-license-key-contact.mdx"; + +This guide explains how to create a ScalarDB Analytics catalog. The ScalarDB Analytics catalog serves as the central hub that organizes information from various underlying data sources, including database schemas and contact points, enabling you to run analytical queries across these data sources through a unified interface. This information is referred to as catalog information. + + + +## Set up ScalarDB Analytics server + +Catalog information is managed by a component called a ScalarDB Analytics server. So, you first need to set up a ScalarDB Analytics server. The ScalarDB Analytics server also performs several other tasks, such as collecting usage metering information and storing it in a file system or cloud blob storage. + +### Prerequisites + +The ScalarDB Analytics server requires a database to store catalog information. We refer to this database as the **metadata database** throughout this documentation. ScalarDB Analytics supports the following databases for the metadata database: + +- PostgreSQL +- MySQL +- SQL Server +- Oracle + +Create a database and user with appropriate privileges before starting the ScalarDB Analytics server. The specific commands vary by database type. + +### Configure the ScalarDB Analytics server + +Create a ScalarDB Analytics server configuration file (for example, `scalardb-analytics-server.properties`). The following example uses PostgreSQL as the metadata database: + +```properties +# Metadata database configuration (required) +scalar.db.analytics.server.db.url=jdbc:postgresql://localhost:5432/scalardb_analytics +scalar.db.analytics.server.db.username=analytics_user +scalar.db.analytics.server.db.password=your_secure_password + +# gRPC server configuration (optional) +scalar.db.analytics.server.catalog.port=11051 # default +scalar.db.analytics.server.metering.port=11052 # default + +# TLS configuration (optional but recommended for production) +scalar.db.analytics.server.tls.enabled=true +scalar.db.analytics.server.tls.cert_chain_path=/path/to/server.crt +scalar.db.analytics.server.tls.private_key_path=/path/to/server.key + +# License configuration (required) +scalar.db.analytics.server.licensing.license_key= +scalar.db.analytics.server.licensing.license_check_cert_pem= + +# Metering storage configuration (required) +scalar.db.analytics.server.metering.storage.provider=filesystem +scalar.db.analytics.server.metering.storage.path=/var/scalardb-analytics/metering +``` + +:::note +For production deployments, configure metering storage to use object storage (for example, Amazon S3, Google Cloud Storage, or Azure Blob Storage) instead of the local filesystem.For detailed configuration options, see the [Configuration reference](./configurations.mdx). +::: + +### Start the ScalarDB Analytics server + +Start the ScalarDB Analytics server with your configuration: + +```console +docker run -d \ + --name scalardb-analytics-server \ + -p 11051:11051 \ + -p 11052:11052 \ + -v /path/to/scalardb-analytics-server.properties:/scalardb-analytics-server/server.properties \ + ghcr.io/scalar-labs/scalardb-analytics-server: +``` + +Replace `` with the ScalarDB Analytics version you want to use. You can find available versions at the [container registry page](https://github.com/scalar-labs/scalardb-analytics/pkgs/container/scalardb-analytics-server-byol). + +The container uses the configuration file at `/scalardb-analytics-server/server.properties` by default. + +The ScalarDB Analytics server will perform the following during startup: + +1. Validate the license +2. Connect to the metadata database +3. Start gRPC services on the configured ports +4. Begin accepting client connections + +:::tip + +Keep note of your server configuration (hostname and ports) as you will need this information later when configuring Spark applications to connect to your catalog. + +::: + +### Check server health (optional) + +If you want to verify the server is running properly, you can use grpc-health-probe (included in the container image): + +```console +# Check catalog service health +docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11051 + +# Check metering service health +docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11052 + +# For TLS-enabled servers +docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11051 -tls -tls-ca-cert=/path/to/ca.crt +``` + +## Set up ScalarDB Analytics CLI + +ScalarDB Analytics CLI is a command-line tool that communicates with the ScalarDB Analytics server to manage catalogs, register data sources, and perform administrative tasks. + +### Install the CLI + +The `scalardb-analytics-cli` tool is available as a container image: + +```console +# Pull the CLI image +docker pull ghcr.io/scalar-labs/scalardb-analytics-cli: +``` + +Replace `` with the ScalarDB Analytics version you want to use. Available versions can be found at the [container registry page](https://github.com/scalar-labs/scalardb-analytics/pkgs/container/scalardb-analytics-cli). + +To run CLI commands, you'll need to mount your configuration file into the container: + +```console +# Example: List catalogs +docker run --rm \ + -v $(pwd)/client.properties:/config/client.properties:ro \ + ghcr.io/scalar-labs/scalardb-analytics-cli: \ + -c /config/client.properties \ + catalog list +``` + +### Configure the client + +Create a configuration file named `client.properties` in your current directory: + +```properties +# Server connection +scalar.db.analytics.client.server.host=localhost +scalar.db.analytics.client.server.catalog.port=11051 +scalar.db.analytics.client.server.metering.port=11052 + +# TLS/SSL configuration (if enabled on server) +scalar.db.analytics.client.server.tls.enabled=true +scalar.db.analytics.client.server.tls.ca_root_cert_path=/path/to/ca.crt +scalar.db.analytics.client.server.tls.override_authority=analytics.example.com +``` + +For detailed configuration options, see the [Configuration reference](./configurations.mdx). + +### Set up an alias (optional) + +For convenience, you can create an alias to avoid typing the long Docker command each time: + +```console +alias scalardb-analytics-cli='docker run --rm -v $(pwd)/client.properties:/config/client.properties:ro ghcr.io/scalar-labs/scalardb-analytics-cli: -c /config/client.properties' +``` + +With this alias, you can run commands more simply: + +```console +scalardb-analytics-cli catalog list +``` + +## Create your catalog + +This section describes how to create a catalog container, add data sources to your catalog, and verify your catalog. + +### Create a catalog container + +A catalog serves as a logical container for organizing data sources. Create your first catalog: + +```console +scalardb-analytics-cli catalog create production +``` + +:::important + +Remember the catalog name you choose here (for example, `production`). You will need to use this exact same name when configuring your Spark applications to connect to this catalog. + +::: + +Verify the catalog was created: + +```console +scalardb-analytics-cli catalog list +``` + +### Add data sources to your catalog + +Create a data source registration file for your database. Here's an example for PostgreSQL: + +Create `postgres-datasource.json`: + +```json +{ + "catalog": "production", + "name": "postgres_customers", + "type": "postgres", + "provider": { + "host": "postgres.example.com", + "port": 5432, + "username": "analytics_user", + "password": "secure_password", + "database": "customers" + } +} +``` + +For detailed configuration options and examples for other database types (MySQL, ScalarDB, Oracle, SQL Server, DynamoDB), see the [Data Source Reference](./reference-data-source.mdx). + +Register the data source: + +```console +scalardb-analytics-cli data-source register --data-source-json postgres-datasource.json +``` + +### Verify your catalog + +List all data sources in your catalog: + +```console +scalardb-analytics-cli data-source list --catalog production +``` + +List namespaces in your catalog: + +```console +scalardb-analytics-cli namespace list --catalog production +``` + +List tables in your catalog: + +```console +scalardb-analytics-cli table list --catalog production +``` + +## Next steps + +You now have a fully functional ScalarDB Analytics catalog with registered data sources. + +To develop analytical applications using this catalog: + +1. **Run analytical queries:** See [Run Analytical Queries Through ScalarDB Analytics](./run-analytical-queries.mdx) +2. **Add more data sources:** See [Data Source Reference](./reference-data-source.mdx) +3. **Deploy in public clouds:** See [Deploy ScalarDB Analytics in Public Cloud Environments](./deployment.mdx) +4. **Explore configuration details:** See [ScalarDB Analytics Configurations](./configurations.mdx) diff --git a/docs/scalardb-analytics/design.mdx b/docs/scalardb-analytics/design.mdx index 3324887c..92523b5d 100644 --- a/docs/scalardb-analytics/design.mdx +++ b/docs/scalardb-analytics/design.mdx @@ -95,7 +95,7 @@ When registering a data source to ScalarDB Analytics, two types of mappings occu 1. **Catalog structure mapping**: The data source's catalog information (namespaces, tables, and columns) is resolved and mapped to the universal data catalog structure 2. **Data type mapping**: Native data types from each data source are mapped to the universal data types listed above -These mappings ensure compatibility and consistency across different database systems. For detailed information about how specific databases are mapped, see [Catalog metadata reference](administration.mdx#catalog-metadata-reference) in the administration guide. +These mappings ensure compatibility and consistency across different database systems. For detailed information about how specific databases are mapped, see [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source). ## Query engine diff --git a/docs/scalardb-analytics/reference-cli-command.mdx b/docs/scalardb-analytics/reference-cli-command.mdx new file mode 100644 index 00000000..c569d73d --- /dev/null +++ b/docs/scalardb-analytics/reference-cli-command.mdx @@ -0,0 +1,211 @@ +--- +tags: + - Enterprise Option +displayed_sidebar: docsEnglish +--- + +# ScalarDB Analytics CLI command reference + +The ScalarDB Analytics CLI uses a hierarchical command structure: + +``` +scalardb-analytics-cli [options] +``` + +Available resources: + +- **catalog:** Top-level containers for organizing data sources +- **data-source:** External databases registered within catalogs +- **namespace:** Database-specific organizational units +- **table:** Database tables within namespaces + +## Catalog operations + +This section describes how to create a new catalog, list all catalogs, show catalog details, and delete a catalog. + +### Create a new catalog + +Creating a new catalog can be done as follows. Please replace `` with the name of a catalog you create. + +``` +scalardb-analytics-cli catalog create --catalog +``` + +### List all catalogs + +Display all existing catalogs in the system. + +``` +scalardb-analytics-cli catalog list +``` + +### Show catalog details + +Display detailed information about a specific catalog. You can specify the catalog either by its name or by its UUID. + +To specify by catalog name: +``` +scalardb-analytics-cli catalog describe --catalog +``` + +Please replace `` with the name of the catalog you want to describe. + +To specify by catalog ID: +``` +scalardb-analytics-cli catalog describe --catalog-id +``` + +Please replace `` with the UUID of the catalog you want to describe. + +### Delete a catalog + +Remove a catalog from the system. The operation fails if the catalog contains data sources unless you use the `--cascade` option to delete all contents. + +To delete an empty catalog: +``` +scalardb-analytics-cli catalog delete --catalog +``` + +Please replace `` with the name of the catalog you want to delete. + +To delete a catalog and all its contents: +``` +scalardb-analytics-cli catalog delete --catalog --cascade +``` + +## Data source operations + +This section describes how to register a new data source, list all data sources, show data source details, and delete a data source + +### Register a new data source + +Add a new data source to a catalog using a data source registration file. + +``` +scalardb-analytics-cli data-source register --data-source-json +``` + +Please replace `` with the file path to your data source registration file. + +The `register` command requires a data source registration file. The file format is described in the [Data source configuration](#data-source-configuration) section below. + +### List all data sources + +Display all data sources within a specific catalog. + +``` +scalardb-analytics-cli data-source list --catalog +``` + +Please replace `` with the name of the catalog whose data sources you want to list. + +### Show data source details + +Display detailed information about a specific data source. You can specify the data source either by its name within a catalog or by its UUID. + +To specify by catalog and data source name: +``` +scalardb-analytics-cli data-source describe --catalog --data-source +``` + +Please replace: +- `` with the name of the catalog containing the data source +- `` with the name of the data source you want to describe + +To specify by data source ID: +``` +scalardb-analytics-cli data-source describe --data-source-id +``` + +Please replace `` with the UUID of the data source you want to describe. + +### Delete a data source + +Remove a data source from a catalog. The operation fails if the data source contains namespaces unless you use the `--cascade` option to delete all contents. + +To delete an empty data source: +``` +scalardb-analytics-cli data-source delete --catalog --data-source +``` + +Please replace: +- `` with the name of the catalog containing the data source +- `` with the name of the data source you want to delete + +To delete a data source and all its contents: +``` +scalardb-analytics-cli data-source delete --catalog --data-source --cascade +``` + +## Namespace operations + +This section describes how to list all namespaces and show namespace details. + +### List all namespaces + +Display all namespaces within a specific catalog. + +``` +scalardb-analytics-cli namespace list --catalog +``` + +Please replace: +- `` with the name of the catalog whose namespaces you want to list + +### Show namespace details + +Display detailed information about a specific namespace. You can specify the namespace either by its name within a data source or by its UUID. For nested namespaces, use `.` as a separator (for example, `--namespace parent.child`). + +To specify by catalog, data source, and namespace name: +``` +scalardb-analytics-cli namespace describe --catalog --data-source --namespace +``` + +Please replace: +- `` with the name of the catalog containing the data source +- `` with the name of the data source containing the namespace +- `` with the name of the namespace you want to describe + +To specify by namespace ID: +``` +scalardb-analytics-cli namespace describe --namespace-id +``` + +Please replace `` with the UUID of the namespace you want to describe. + +## Table operations + +This section describes how to list all tables and show the table schema. + +### List all tables + +Display all tables within a specific catalog. + +``` +scalardb-analytics-cli table list --catalog +``` + +Please replace: +- `` with the name of the catalog containing the data source + +### Show the table schema + +Display the schema information including all columns for a specific table. You can specify the table either by its name within a namespace or by its UUID. For nested namespaces, use `.` as a separator (for example, `--namespace parent.child`). + +To specify by catalog, data source, namespace, and table name: +``` +scalardb-analytics-cli table describe --catalog --data-source --namespace --table +``` + +Please replace: +- `` with the name of the catalog containing the data source +- `` with the name of the data source containing the namespace +- `` with the name of the namespace containing the table +- `` with the name of the table you want to describe + +To specify by table ID: +``` +scalardb-analytics-cli table describe --table-id +``` + +Please replace `` with the UUID of the table you want to describe. diff --git a/docs/scalardb-analytics/administration.mdx b/docs/scalardb-analytics/reference-data-source.mdx similarity index 52% rename from docs/scalardb-analytics/administration.mdx rename to docs/scalardb-analytics/reference-data-source.mdx index 783b4319..9d94980c 100644 --- a/docs/scalardb-analytics/administration.mdx +++ b/docs/scalardb-analytics/reference-data-source.mdx @@ -7,256 +7,17 @@ displayed_sidebar: docsEnglish import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; -# Manage ScalarDB Analytics +# Data Source Reference import WarningLicenseKeyContact from "/src/components/en-us/_warning-license-key-contact.mdx"; -This guide explains how to set up and manage the ScalarDB Analytics server and its catalogs. The ScalarDB Analytics server is the implementation of the Universal Data Catalog described in the [ScalarDB Analytics Design](design.mdx), providing centralized metadata management for analytical queries across multiple databases. +This reference guide provides detailed information about data source configuration formats, provider-specific settings, and data type mappings for ScalarDB Analytics. -## Overview +## Data source registration file format -ScalarDB Analytics provides a universal data catalog that enables unified access to multiple databases through a single interface. Based on the Universal Data Catalog architecture described in the [ScalarDB Analytics Design](design.mdx#universal-data-catalog) documentation, the system consists of two main components: - -1. **ScalarDB Analytics server**: A gRPC-based service that manages: - - **Catalog metadata**: Organizes data sources, namespaces, tables, and columns - - **Data source connections**: Maintains connection information and credentials for external databases - - **License validation**: Verifies enterprise licenses - - **Usage metering**: Tracks resource usage for billing purposes - - The server provides two gRPC endpoints: - - Port 11051: Catalog service for metadata operations - - Port 11052: Metering service for usage tracking - -2. **ScalarDB Analytics CLI**: A command-line tool that communicates with the server to manage catalogs, register data sources, and perform administrative tasks - -## Setup - -Before managing catalogs, you need to set up and configure the ScalarDB Analytics server and CLI. - -### Server setup - -#### Prerequisites: Metadata database - -The server requires a database to store catalog metadata and data source connection information. We refer to this database as the **metadata database** throughout this documentation. ScalarDB Analytics supports the following databases for the metadata database: - -- PostgreSQL -- MySQL -- SQL Server -- Oracle - -Create a database and user with appropriate privileges before starting the server. The specific commands vary by database type. - -#### Server configuration - -Create a server configuration file (e.g., `scalardb-analytics-server.properties`). The following example uses PostgreSQL as the metadata database: - -```properties -# Metadata database configuration (required) -scalar.db.analytics.server.db.url=jdbc:postgresql://localhost:5432/scalardb_analytics -scalar.db.analytics.server.db.username=analytics_user -scalar.db.analytics.server.db.password=your_secure_password - -# gRPC server configuration (optional) -scalar.db.analytics.server.catalog.port=11051 # default -scalar.db.analytics.server.metering.port=11052 # default - -# TLS configuration (optional but recommended for production) -scalar.db.analytics.server.tls.enabled=true -scalar.db.analytics.server.tls.cert_chain_path=/path/to/server.crt -scalar.db.analytics.server.tls.private_key_path=/path/to/server.key - -# License configuration (required) -scalar.db.analytics.server.licensing.license_key= -scalar.db.analytics.server.licensing.license_check_cert_pem= - -# Metering storage configuration (required) -scalar.db.analytics.server.metering.storage.provider=filesystem -scalar.db.analytics.server.metering.storage.path=/var/scalardb-analytics/metering -``` - -For detailed configuration options, see the [Configuration reference](configuration.mdx). - -#### Starting the server - -Start the ScalarDB Analytics server with your configuration: - -```console -docker run -d \ - --name scalardb-analytics-server \ - -p 11051:11051 \ - -p 11052:11052 \ - -v /path/to/scalardb-analytics-server.properties:/scalardb-analytics-server/server.properties \ - ghcr.io/scalar-labs/scalardb-analytics-server: -``` - -Replace `` with the ScalarDB Analytics version you want to use. You can find available versions at the [Docker registry page](https://github.com/scalar-labs/scalardb-analytics/pkgs/container/scalardb-analytics-server). - -The container uses the configuration file at `/scalardb-analytics-server/server.properties` by default. - -The server will perform the following during startup: - -1. Validate the license -2. Connect to the metadata database -3. Start gRPC services on the configured ports -4. Begin accepting client connections - -#### Health checks (optional) - -If you want to verify the server is running properly, you can use grpc-health-probe (included in the Docker container): - -```console -# Check catalog service health -docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11051 - -# Check metering service health -docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11052 - -# For TLS-enabled servers -docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11051 -tls -tls-ca-cert=/path/to/ca.crt -``` - -### CLI setup - -#### Installing the CLI - -The `scalardb-analytics-cli` tool is available as a Docker image: - -```console -# Pull the CLI image -docker pull ghcr.io/scalar-labs/scalardb-analytics-cli: -``` - -Replace `` with the ScalarDB Analytics version you want to use. Available versions can be found at the [Docker registry page](https://github.com/scalar-labs/scalardb-analytics/pkgs/container/scalardb-analytics-cli). - -To run CLI commands, you'll need to mount your configuration file into the container: - -```console -# Example: List catalogs -docker run --rm \ - -v $(pwd)/client.properties:/config/client.properties:ro \ - ghcr.io/scalar-labs/scalardb-analytics-cli: \ - -c /config/client.properties \ - catalog list -``` - -#### Client configuration - -Create a configuration file named `client.properties` in your current directory: - -```properties -# Server connection -scalar.db.analytics.client.server.host=localhost -scalar.db.analytics.client.server.catalog.port=11051 -scalar.db.analytics.client.server.metering.port=11052 - -# TLS/SSL configuration (if enabled on server) -scalar.db.analytics.client.server.tls.enabled=true -scalar.db.analytics.client.server.tls.ca_root_cert_path=/path/to/ca.crt -scalar.db.analytics.client.server.tls.override_authority=analytics.example.com -``` - -For detailed configuration options, see the [Configuration reference](configuration.mdx). - -#### Setting up an alias (optional) - -For convenience, you can create an alias to avoid typing the long Docker command each time: - -```console -alias scalardb-analytics-cli='docker run --rm -v $(pwd)/client.properties:/config/client.properties:ro ghcr.io/scalar-labs/scalardb-analytics-cli: -c /config/client.properties' -``` - -With this alias, you can run commands more simply: - -```console -scalardb-analytics-cli catalog list -``` - -## CLI command reference - -The ScalarDB Analytics CLI uses a hierarchical command structure: - -``` -scalardb-analytics [options] -``` - -Available resources: - -- **catalog**: Top-level containers for organizing data sources -- **data-source**: External databases registered within catalogs -- **namespace**: Database-specific organizational units (auto-discovered) -- **table**: Data structures within namespaces (auto-discovered) - -Note: In all examples below, we assume you're using the Docker alias created earlier. If running Docker commands directly, replace `scalardb-analytics-cli` with the full Docker command. - -### Catalog operations - -```console -# Create a new catalog -scalardb-analytics-cli catalog create --catalog - -# List all catalogs -scalardb-analytics-cli catalog list - -# Show catalog details (by name or ID) -scalardb-analytics-cli catalog describe --catalog -scalardb-analytics-cli catalog describe --catalog-id - -# Delete a catalog (fails if not empty unless --cascade is used) -scalardb-analytics-cli catalog delete --catalog -scalardb-analytics-cli catalog delete --catalog --cascade -``` - -### Data source operations - -```console -# Register a new data source using a JSON definition file -scalardb-analytics-cli data-source register --data-source-json - -# List all data sources in a catalog -scalardb-analytics-cli data-source list --catalog - -# Show data source details (by name or ID) -scalardb-analytics-cli data-source describe --catalog --data-source -scalardb-analytics-cli data-source describe --data-source-id - -# Delete a data source (fails if not empty unless --cascade is used) -scalardb-analytics-cli data-source delete --catalog --data-source -scalardb-analytics-cli data-source delete --catalog --data-source --cascade -``` - -The `register` command requires a JSON definition file. The JSON file format is described in the [Data source configuration](#data-source-configuration) section below. - -### Namespace operations - -```console -# List all namespaces in a data source -scalardb-analytics-cli namespace list --catalog --data-source - -# Show namespace details (by name or ID) -# For nested namespaces, use '.' as a separator (e.g., --namespace parent.child) -scalardb-analytics-cli namespace describe --catalog --data-source --namespace -scalardb-analytics-cli namespace describe --namespace-id -``` - -### Table operations - -```console -# List all tables in a namespace -scalardb-analytics-cli table list --catalog --data-source --namespace - -# Show table schema with all columns (by name or ID) -# For nested namespaces, use '.' as a separator (e.g., --namespace parent.child) -scalardb-analytics-cli table describe --catalog --data-source --namespace --table -scalardb-analytics-cli table describe --table-id -``` - -## Data source configuration - -### Data source JSON format - -Data sources are registered using JSON definition files with the following structure: +Data sources are registered to catalogs using the CLI with data source registration files. These files have the following structure. For CLI command details, see [CLI command reference](./reference-cli-command.mdx). ```json { @@ -270,22 +31,25 @@ Data sources are registered using JSON definition files with the following struc } ``` -The `provider` section contains database-specific connection settings that vary based on the `type` field. +The `provider` section contains data source-specific connection settings that vary based on the `type` field. -### Provider configuration by type +## Provider configuration by type The following sections show the provider configuration for each supported database type: -#### Configuration +

Configurations

-| Field | Required | Description | Default | -| ------------ | -------- | --------------------------------------- | ------- | -| `configPath` | Yes | Path to the ScalarDB configuration file | - | +The following configuration is for ScalarDB. -**Example:** +

`configPath`

+ +- **Field:** `configPath` +- **Description:** Path to the ScalarDB configuration file. + +

Example

```json { @@ -301,17 +65,36 @@ The following sections show the provider configuration for each supported databa
-#### Configuration +

Configuration

+ +The following configurations are for PostgreSQL. + +

`host`

+ +- **Field:** `host` +- **Description:** PostgreSQL server hostname. + +

`port`

+ +- **Field:** `port` +- **Description:** Port number. + +

`username`

-| Field | Required | Description | Default | -| ---------- | -------- | --------------------------- | ------- | -| `host` | Yes | PostgreSQL server hostname | - | -| `port` | Yes | Port number | - | -| `username` | Yes | Database user | - | -| `password` | Yes | Database password | - | -| `database` | Yes | Database name to connect to | - | +- **Field:** `username` +- **Description:** Database user. -**Example:** +

`password`

+ +- **Field:** `password` +- **Description:** Database password. + +

`database`

+ +- **Field:** `database` +- **Description:** Database name to connect to. + +

Example

```json { @@ -331,17 +114,37 @@ The following sections show the provider configuration for each supported databa
-#### Configuration +

Configuration

+ +The following configurations are for MySQL. + +

`host`

+ +- **Field:** `host` +- **Description:** MySQL server hostname. + +

`port`

+ +- **Field:** `port` +- **Description:** Port number. + +

`username`

-| Field | Required | Description | Default | -| ---------- | -------- | ----------------------------------------------------------------------- | ------- | -| `host` | Yes | MySQL server hostname | - | -| `port` | Yes | Port number | - | -| `username` | Yes | Database user | - | -| `password` | Yes | Database password | - | -| `database` | No | Specific database to import. If omitted, all databases will be imported | - | +- **Field:** `username` +- **Description:** Database user. -**Example:** +

`password`

+ +- **Field:** `password` +- **Description:** Database password. + +

`database`

+ +- **Field:** `database` +- **Description:** Specific database to import. If omitted, all databases will be imported. +- **Default value:** None (imports all databases) + +

Example

```json { @@ -361,17 +164,36 @@ The following sections show the provider configuration for each supported databa
-#### Configuration +

Configuration

+ +The following configurations are for Oracle. + +

`host`

+ +- **Field:** `host` +- **Description:** Oracle server hostname. + +

`port`

+ +- **Field:** `port` +- **Description:** Port number. + +

`username`

+ +- **Field:** `username` +- **Description:** Database user. -| Field | Required | Description | Default | -| ------------- | -------- | ---------------------- | ------- | -| `host` | Yes | Oracle server hostname | - | -| `port` | Yes | Port number | - | -| `username` | Yes | Database user | - | -| `password` | Yes | Database password | - | -| `serviceName` | Yes | Oracle service name | - | +

`password`

-**Example:** +- **Field:** `password` +- **Description:** Database password. + +

`serviceName`

+ +- **Field:** `serviceName` +- **Description:** Oracle service name. + +

Example

```json { @@ -391,18 +213,43 @@ The following sections show the provider configuration for each supported databa
-#### Configuration +

Configuration

+ +The following configurations are for SQL Server. + +

`host`

+ +- **Field:** `host` +- **Description:** SQL Server hostname. + +

`port`

+ +- **Field:** `port` +- **Description:** Port number. + +

`username`

+ +- **Field:** `username` +- **Description:** Database user. -| Field | Required | Description | Default | -| ---------- | -------- | ------------------------------- | ------- | -| `host` | Yes | SQL Server hostname | - | -| `port` | Yes | Port number | - | -| `username` | Yes | Database user | - | -| `password` | Yes | Database password | - | -| `database` | No | Specific database to connect to | - | -| `secure` | No | Enable encryption | - | +

`password`

-**Example:** +- **Field:** `password` +- **Description:** Database password. + +

`database`

+ +- **Field:** `database` +- **Description:** Specific database to connect to. +- **Default value:** None (connects to default database) + +

`secure`

+ +- **Field:** `secure` +- **Description:** Enable encryption. +- **Default value:** `false` + +

Example

```json { @@ -423,33 +270,84 @@ The following sections show the provider configuration for each supported databa
-#### Configuration +

Configuration

+ +The following configurations are for DynamoDB. + +:::note AWS Credentials + +DynamoDB authentication uses the standard AWS SDK credential provider chain. Credentials can be configured through: + +- Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) +- AWS credentials file (`~/.aws/credentials`) +- IAM roles (when running on EC2, ECS, or Lambda) +- AWS SSO or other credential providers supported by the AWS SDK + +For more information, see the [AWS SDK documentation on credential providers](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). + +::: + +

`region`

+ +- **Field:** `region` +- **Description:** AWS region (e.g., us-east-1). Either `region` or `endpoint` must be specified (not both). + +

`endpoint`

+ +- **Field:** `endpoint` +- **Description:** Custom endpoint URL. Either `region` or `endpoint` must be specified (not both). + +

`schema`

+ +- **Field:** `schema` +- **Description:** Complete schema definition. Since DynamoDB is schema-less, you must provide a complete schema definition. + +

Schema structure

+ +The schema field must contain the following structure: + +
`.schema.namespaces[]`
-| Field | Required | Description | Default | -| ---------- | -------- | ---------------------------- | ------- | -| `region` | Yes\* | AWS region (e.g., us-east-1) | - | -| `endpoint` | Yes\* | Custom endpoint URL | - | -| `schema` | Yes | Complete schema definition | - | +- **Field:** `.schema.namespaces[]` +- **Description:** Array of namespace definitions. -\* Either `region` or `endpoint` must be specified (not both). +
`.schema.namespaces[].names[]`
-Since DynamoDB is schema-less, you must provide a complete schema definition. +- **Field:** `.schema.namespaces[].names[]` +- **Description:** Array of namespace names (strings). -##### Schema structure +
`.schema.namespaces[].tables[]`
-| Field | Required | Description | Default | -| -------------------------------------------------- | -------- | -------------------------------------- | ------- | -| `.schema` | Yes | Complete schema definition | - | -| `.schema.namespaces[]` | Yes | Array of namespace definitions | - | -| `.schema.namespaces[].names[]` | Yes | Array of namespace names (strings) | - | -| `.schema.namespaces[].tables[]` | Yes | Array of table definitions | - | -| `.schema.namespaces[].tables[].name` | Yes | Table name | - | -| `.schema.namespaces[].tables[].columns[]` | Yes | Array of column definitions | - | -| `.schema.namespaces[].tables[].columns[].name` | Yes | Column name | - | -| `.schema.namespaces[].tables[].columns[].type` | Yes | Data type | - | -| `.schema.namespaces[].tables[].columns[].nullable` | No | Whether column can contain null values | true | +- **Field:** `.schema.namespaces[].tables[]` +- **Description:** Array of table definitions. -**Example:** +
`.schema.namespaces[].tables[].name`
+ +- **Field:** `.schema.namespaces[].tables[].name` +- **Description:** Table name. + +
`.schema.namespaces[].tables[].columns[]`
+ +- **Field:** `.schema.namespaces[].tables[].columns[]` +- **Description:** Array of column definitions. + +
`.schema.namespaces[].tables[].columns[].name`
+ +- **Field:** `.schema.namespaces[].tables[].columns[].name` +- **Description:** Column name. + +
`.schema.namespaces[].tables[].columns[].type`
+ +- **Field:** `.schema.namespaces[].tables[].columns[].type` +- **Description:** Data type. + +
`.schema.namespaces[].tables[].columns[].nullable`
+ +- **Field:** `.schema.namespaces[].tables[].columns[].nullable` +- **Description:** Whether column can contain null values. +- **Default value:** `true` + +

Example

```json { @@ -487,7 +385,9 @@ Since DynamoDB is schema-less, you must provide a complete schema definition.
-## Catalog metadata reference +## Catalog information reference + +This section describes catalog structure mappings by data source and data type mappings. ### Catalog structure mappings by data source @@ -636,6 +536,12 @@ The catalog-level mappings are the mappings of the namespace names, table names, The following sections show how native types from each data source are mapped to ScalarDB Analytics types: +:::warning + +Columns with data types that are not included in the mapping tables below will be ignored during data source registration. These columns will not appear in the ScalarDB Analytics catalog and cannot be queried. Information about ignored columns is logged in the ScalarDB Analytics server logs. + +::: + @@ -769,30 +675,23 @@ The following sections show how native types from each data source are mapped to | **DynamoDB Data Type** | **ScalarDB Analytics Data Type** | | :--------------------- | :------------------------------- | -| `Number` | `BYTE` | -| `Number` | `SMALLINT` | -| `Number` | `INT` | -| `Number` | `BIGINT` | -| `Number` | `FLOAT` | -| `Number` | `DOUBLE` | -| `Number` | `DECIMAL` | | `String` | `TEXT` | +| `Number` | `DOUBLE` | | `Binary` | `BLOB` | | `Boolean` | `BOOLEAN` | +| `Null` | `NULL` | +| `String Set` | `TEXT` | +| `Number Set` | `TEXT` | +| `Binary Set` | `TEXT` | +| `List` | `TEXT` | +| `Map` | `TEXT` | -:::warning +:::note -It is important to ensure that the field values of `Number` types are parsable as a specified data type for ScalarDB Analytics. For example, if a column that corresponds to a `Number`-type field is specified as an `INT` type, its value must be an integer. If the value is not an integer, an error will occur when running a query. +DynamoDB complex data types (String Set, Number Set, Binary Set, List, Map) are mapped to `TEXT` for compatibility. The actual values are serialized as JSON strings in ScalarDB Analytics queries. ::: -## Next steps - -After setting up the ScalarDB Analytics server and managing catalogs: - -1. Configure Spark or other query engines to use your catalogs - see [Configuration Reference](configuration.mdx) -2. Start running analytical queries - see [Run Analytical Queries](run-analytical-queries.mdx) -3. Set up production deployment - see [Deployment Guide](deployment.mdx) diff --git a/sidebars.js b/sidebars.js index 3d803ff5..5a55b1e3 100644 --- a/sidebars.js +++ b/sidebars.js @@ -328,6 +328,11 @@ const sidebars = { id: 'develop-run-analytical-queries-overview', }, items: [ + { + type: 'doc', + id: 'scalardb-analytics/create-scalardb-analytics-catalog', + label: 'Create a Catalog', + }, { type: 'doc', id: 'scalardb-analytics/run-analytical-queries', @@ -507,13 +512,18 @@ const sidebars = { }, { type: 'doc', - id: 'scalardb-analytics/administration', - label: 'Manage ScalarDB Analytics', + id: 'scalardb-analytics/configurations', + label: 'Configurations', }, { type: 'doc', - id: 'scalardb-analytics/configuration', - label: 'Configuration Reference', + id: 'scalardb-analytics/reference-data-source', + label: 'Data Source Reference', + }, + { + type: 'doc', + id: 'scalardb-analytics/reference-cli-command', + label: 'CLI Command Reference', }, ], }, @@ -1464,12 +1474,7 @@ const sidebars = { }, { type: 'doc', - id: 'scalardb-analytics/administration', - label: 'ScalarDB Analytics を管理', - }, - { - type: 'doc', - id: 'scalardb-analytics/configuration', + id: 'scalardb-analytics/configurations', label: '設定リファレンス', }, ],