Skip to content

Commit abcce8e

Browse files
authored
Astra DB collection automatic management behavior (#506)
1 parent 39923ad commit abcce8e

File tree

4 files changed

+22
-4
lines changed

4 files changed

+22
-4
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
- `<name>` (_required_) - A unique name for this connector.
22
- `<token>` (_required_) - The application token for the database.
33
- `<api-endpoint>` (_required_) - The database’s associated API endpoint.
4-
- `<collection-name>` (_required_) - The name of the collection in the namespace.
4+
- `<collection-name>` - The name of the collection in the namespace. If no value is provided, see the beginning of this article for the behavior at run time.
55
- `<keyspace>` - The name of the keyspace in the collection. The default is `default_keyspace` if not otherwise specified.
66
- `<batch-size>` - The maximum number of records to send per batch. The default is `20` if not otherwise specified.
77
- `flatten_metadata` - Set to `true` to flatten the metadata into each record. Specifically, when flattened, the metadata key values are brought to the top level of the element, and the `metadata` key itself is removed. By default, the metadata is not flattened (`false`).

snippets/general-shared-text/astradb-cli-api.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ These environment variables:
1313
- `ASTRA_DB_API_ENDPOINT` - The API endpoint for the Astra DB database, represented by `--api-endpoint` (CLI) or `api_endpoint` (Python). To get the endpoint, see the **Database Details > API Endpoint** value on your database's **Overview** tab.
1414
- `ASTRA_DB_APPLICATION_TOKEN` - The database application token value for the database, represented by `--token` (CLI) or `token` (Python). To get the token, see the **Database Details > Application Tokens** box on your database's **Overview** tab.
1515
- `ASTRA_DB_KEYSPACE` - The name of the keyspace for the database, represented by `--keyspace` (CLI) or `keyspace` (Python).
16-
- `ASTRA_DB_COLLECTION` - The name of the collection for the keyspace, represented by `--collection-name` (CLI) or `collection_name` (Python).
16+
- `ASTRA_DB_COLLECTION` - The name of the collection for the keyspace, represented by `--collection-name` (CLI) or `collection_name` (Python). If no value is provided, see the beginning of this article for the behavior at run time.
1717

1818
Additional settings include:
1919

snippets/general-shared-text/astradb-platform.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Fill in the following fields:
22

33
- **Name** (_required_): A unique name for this connector.
4-
- **Collection Name** (_required_): The name of the collection in the namespace.
4+
- **Collection Name**: The name of the collection in the namespace. If no value is provided, see the beginning of this article for the behavior at run time.
55
- **Keyspace** (_required_): The name of the keyspace in the collection.
66
- **Batch Size**: The maximum number of records per batch. The default is `20` if not otherwise specified.
77
- **Flatten Metadata**: Check this box to flatten the metadata into each record.

snippets/general-shared-text/astradb.mdx

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,22 @@ allowfullscreen
1212
- A database in the Astra account. [Create a database in an account](https://docs.datastax.com/en/astra-db-classic/databases/manage-create.html).
1313
- An application token for the database. [Create a database application token](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html).
1414
- A namespace in the database. [Create a namespace in a database](https://docs.datastax.com/en/astra-db-serverless/databases/manage-namespaces.html#create-namespace).
15-
- A collection in the namespace. [Create a collection in a namespace](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).
15+
- A collection in the namespace. [Create a collection in a namespace](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).
16+
17+
An existing collection is not required. At runtime, the collection behavior is as follows:
18+
19+
For the [Unstructured Platform](/platform/overview):
20+
21+
- If an existing collection name is specified, and Unstructured generates embeddings,
22+
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
23+
You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
24+
- If a collection name is not specified, Unstructured creates a new collection in your namespace. If Unstructured generates embeddings,
25+
the new collections's name will be `u<short-workflow-id>_<short-embedding-model-name>_<number-of-dimensions>`.
26+
If Unstructured does not generate embeddings, the new collections's name will be `u<short-workflow-id`.
27+
28+
For [Unstructured Ingest](/ingestion/overview):
29+
30+
- If an existing collection name is specified, and Unstructured generates embeddings,
31+
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
32+
You must change your Unstructured embedding settings or your existing collections's embedding settings to match, and try the run again.
33+
- If a collection name is not specified, Unstructured creates a new collection in your Pinecone account. The new collection's name will be `unstructuredautocreated`.

0 commit comments

Comments
 (0)