Skip to content

Latest commit

 

History

History
135 lines (105 loc) · 4.54 KB

File metadata and controls

135 lines (105 loc) · 4.54 KB
title summary aliases
Amazon Titan Embeddings
Learn how to use Amazon Titan embedding models in TiDB Cloud.
/tidbcloud/vector-search-auto-embedding-amazon-titan/

Amazon Titan Embeddings

This document describes how to use Amazon Titan embedding models with Auto Embedding in TiDB Cloud to perform semantic searches with text queries.

Note:

Auto Embedding is only available on {{{ .starter }}} clusters hosted on AWS.

Available models

TiDB Cloud provides the following Amazon Titan embedding model natively. No API key is required.

Amazon Titan Text Embedding V2 model

  • Name: tidbcloud_free/amazon/titan-embed-text-v2
  • Dimensions: 1024 (default), 512, 256
  • Distance metric: Cosine, L2
  • Languages: English (100+ languages in preview)
  • Typical use cases: RAG, document search, reranking, and classification
  • Maximum input text tokens: 8,192
  • Maximum input text characters: 50,000
  • Price: Free
  • Hosted by TiDB Cloud: ✅
  • Bring Your Own Key: ❌

For more information about this model, see Amazon Bedrock documentation.

SQL usage example

The following example shows how to use the Amazon Titan embedding model with Auto Embedding.

CREATE TABLE sample (
  `id`        INT,
  `content`   TEXT,
  `embedding` VECTOR(1024) GENERATED ALWAYS AS (EMBED_TEXT(
                "tidbcloud_free/amazon/titan-embed-text-v2",
                `content`
              )) STORED
);


INSERT INTO sample
    (`id`, `content`)
VALUES
    (1, "Java: Object-oriented language for cross-platform development."),
    (2, "Java coffee: Bold Indonesian beans with low acidity."),
    (3, "Java island: Densely populated, home to Jakarta."),
    (4, "Java's syntax is used in Android apps."),
    (5, "Dark roast Java beans enhance espresso blends.");


SELECT `id`, `content` FROM sample
ORDER BY
  VEC_EMBED_COSINE_DISTANCE(
    embedding,
    "How to start learning Java programming?"
  )
LIMIT 2;

Result:

+------+----------------------------------------------------------------+
| id   | content                                                        |
+------+----------------------------------------------------------------+
|    1 | Java: Object-oriented language for cross-platform development. |
|    4 | Java's syntax is used in Android apps.                         |
+------+----------------------------------------------------------------+

Options

You can specify the following options via the additional_json_options parameter of the EMBED_TEXT() function:

  • normalize (optional): whether to normalize the output embedding. Defaults to true.
  • dimensions (optional): the number of dimensions of the output embedding. Supported values: 1024 (default), 512, and 256.

Example: Use an alternative dimension

CREATE TABLE sample (
  `id`        INT,
  `content`   TEXT,
  `embedding` VECTOR(512) GENERATED ALWAYS AS (EMBED_TEXT(
                "tidbcloud_free/amazon/titan-embed-text-v2",
                `content`,
                '{"dimensions": 512}'
              )) STORED
);


INSERT INTO sample
    (`id`, `content`)
VALUES
    (1, "Java: Object-oriented language for cross-platform development."),
    (2, "Java coffee: Bold Indonesian beans with low acidity."),
    (3, "Java island: Densely populated, home to Jakarta."),
    (4, "Java's syntax is used in Android apps."),
    (5, "Dark roast Java beans enhance espresso blends.");


SELECT `id`, `content` FROM sample
ORDER BY
  VEC_EMBED_COSINE_DISTANCE(
    embedding,
    "How to start learning Java programming?"
  )
LIMIT 2;

Result:

+------+----------------------------------------------------------------+
| id   | content                                                        |
+------+----------------------------------------------------------------+
|    1 | Java: Object-oriented language for cross-platform development. |
|    4 | Java's syntax is used in Android apps.                         |
+------+----------------------------------------------------------------+

See also