title

summary

aliases

Gemini Embeddings

Learn how to use Google Gemini embedding models in TiDB Cloud.

/tidbcloud/vector-search-auto-embedding-gemini/

Gemini Embeddings

This document describes how to use Gemini embedding models with Auto Embedding in TiDB Cloud to perform semantic searches with text queries.

Note:

Auto Embedding is only available on {{{ .starter }}} clusters hosted on AWS.

Available models

All Gemini models are available for use with the gemini/ prefix if you bring your own Gemini API key (BYOK). For example:

gemini-embedding-001

Name: gemini/gemini-embedding-001
Dimensions: 128–3072 (default: 3072)
Distance metric: Cosine, L2
Maximum input text tokens: 2,048
Price: Charged by Google
Hosted by TiDB Cloud: ❌
Bring Your Own Key: ✅

For a full list of available models, see Gemini documentation.

Usage example

This example shows how to create a vector table, insert documents, and run similarity search using Google Gemini embedding models.

Step 1: Connect to the database

from pytidb import TiDBClient

tidb_client = TiDBClient.connect(
    host="{gateway-region}.prod.aws.tidbcloud.com",
    port=4000,
    username="{prefix}.root",
    password="{password}",
    database="{database}",
    ensure_db=True,
)

mysql -h {gateway-region}.prod.aws.tidbcloud.com \
    -P 4000 \
    -u {prefix}.root \
    -p{password} \
    -D {database}

Step 2: Configure the API key

Create your API key from the Google AI Studio and bring your own key (BYOK) to use the embedding service.

Configure the API key for the Google Gemini embedding provider using the TiDB Client:

tidb_client.configure_embedding_provider(
    provider="google_gemini",
    api_key="{your-google-api-key}",
)

Set the API key for the Google Gemini embedding provider using SQL:

SET @@GLOBAL.TIDB_EXP_EMBED_GEMINI_API_KEY = "{your-google-api-key}";

Step 3: Create a vector table

Create a table with a vector field that uses the gemini-embedding-001 model to generate 3072-dimensional vectors (default):

from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction
from pytidb.datatype import TEXT

class Document(TableModel):
    __tablename__ = "sample_documents"
    id: int = Field(primary_key=True)
    content: str = Field(sa_type=TEXT)
    embedding: list[float] = EmbeddingFunction(
        model_name="gemini-embedding-001"
    ).VectorField(source_field="content")

table = tidb_client.create_table(schema=Document, if_exists="overwrite")

CREATE TABLE sample_documents (
    `id`        INT PRIMARY KEY,
    `content`   TEXT,
    `embedding` VECTOR(3072) GENERATED ALWAYS AS (EMBED_TEXT(
        "gemini-embedding-001",
        `content`
    )) STORED
);

Step 4: Insert data into the table

Use the table.insert() or table.bulk_insert() API to add data:

documents = [
    Document(id=1, content="Java: Object-oriented language for cross-platform development."),
    Document(id=2, content="Java coffee: Bold Indonesian beans with low acidity."),
    Document(id=3, content="Java island: Densely populated, home to Jakarta."),
    Document(id=4, content="Java's syntax is used in Android apps."),
    Document(id=5, content="Dark roast Java beans enhance espresso blends."),
]
table.bulk_insert(documents)

Insert data using the INSERT INTO statement:

INSERT INTO sample_documents (id, content)
VALUES
    (1, "Java: Object-oriented language for cross-platform development."),
    (2, "Java coffee: Bold Indonesian beans with low acidity."),
    (3, "Java island: Densely populated, home to Jakarta."),
    (4, "Java's syntax is used in Android apps."),
    (5, "Dark roast Java beans enhance espresso blends.");

Step 5: Search for similar documents

Use the table.search() API to perform vector search:

results = table.search("How to start learning Java programming?") \
    .limit(2) \
    .to_list()
print(results)

Use the VEC_EMBED_COSINE_DISTANCE function to perform vector search based on cosine distance:

SELECT
    `id`,
    `content`,
    VEC_EMBED_COSINE_DISTANCE(embedding, "How to start learning Java programming?") AS _distance
FROM sample_documents
ORDER BY _distance ASC
LIMIT 2;

Custom embedding dimensions

The gemini-embedding-001 model supports flexible dimensions through Matryoshka Representation Learning (MRL). You can specify the desired dimensions in your embedding function:

# For 1536 dimensions
embedding: list[float] = EmbeddingFunction(
    model_name="gemini-embedding-001",
    dimensions=1536
).VectorField(source_field="content")

# For 768 dimensions
embedding: list[float] = EmbeddingFunction(
    model_name="gemini-embedding-001",
    dimensions=768
).VectorField(source_field="content")

-- For 1536 dimensions
`embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT(
    "gemini-embedding-001",
    `content`,
    '{"embedding_config": {"output_dimensionality": 1536}}'
)) STORED

-- For 768 dimensions
`embedding` VECTOR(768) GENERATED ALWAYS AS (EMBED_TEXT(
    "gemini-embedding-001",
    `content`,
    '{"embedding_config": {"output_dimensionality": 768}}'
)) STORED

Choose dimensions based on your performance requirements and storage constraints. Higher dimensions can improve accuracy but require more storage and compute resources.

Options

All Gemini options are supported via the additional_json_options parameter of the EMBED_TEXT() function.

Example: Specify the task type to improve quality

CREATE TABLE sample (
  `id`        INT,
  `content`   TEXT,
  `embedding` VECTOR(1024) GENERATED ALWAYS AS (EMBED_TEXT(
                "gemini/gemini-embedding-001",
                `content`,
                '{"task_type": "SEMANTIC_SIMILARITY"}'
              )) STORED
);

Example: Use an alternative dimension

CREATE TABLE sample (
  `id`        INT,
  `content`   TEXT,
  `embedding` VECTOR(768) GENERATED ALWAYS AS (EMBED_TEXT(
                "gemini/gemini-embedding-001",
                `content`,
                '{"output_dimensionality": 768}'
              )) STORED
);

For all available options, see Gemini documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini Embeddings

Available models

Usage example

Step 1: Connect to the database

Step 2: Configure the API key

Step 3: Create a vector table

Step 4: Insert data into the table

Step 5: Search for similar documents

Custom embedding dimensions

Options

See also

FilesExpand file tree

vector-search-auto-embedding-gemini.md

Latest commit

History

vector-search-auto-embedding-gemini.md

File metadata and controls

Gemini Embeddings

Available models

Usage example

Step 1: Connect to the database

Step 2: Configure the API key

Step 3: Create a vector table

Step 4: Insert data into the table

Step 5: Search for similar documents

Custom embedding dimensions

Options

See also