Skip to content

Commit d4df85a

Browse files
[Hold] Weaviate destination connector: collection management behavior updates (#473)
Co-authored-by: Maria Khalusova <[email protected]>
1 parent 130ade3 commit d4df85a

File tree

4 files changed

+48
-6
lines changed

4 files changed

+48
-6
lines changed
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
- `<name>` (_required_) - A unique name for this connector.
22
- `<host-url>` (_required_) - The URL of the Weaviate database cluster.
3-
- `<class-name>` (_required_) - The name of the target collection within the cluster.
3+
- `<class-name>` - The name of the target collection within the cluster. If no value is provided, see the beginning of this article
4+
for the behavior at run time.
45
- `<api-key>` (_required_) - The API key provided by Weaviate to access the cluster.

snippets/general-shared-text/weaviate-cli-api.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ The following environment variables:
1414
- For Embedded Weaviate:
1515

1616
- `WEAVIATE_HOST` - The connection URL to the instance, represented by `--hostname` (CLI) or `hostname` (Python).
17-
- `WEAVIATE_COLLECTION` - The name of the target collection in the instance, represented by `--collection` (CLI) or `collection` (Python).
17+
- `WEAVIATE_COLLECTION` - The name of the target collection in the instance, represented by `--collection` (CLI) or `collection` (Python).
18+
If no value is provided, see the beginning of this article for the behavior at run time.
1819

1920
- For Weaviate Cloud:
2021

@@ -23,4 +24,5 @@ The following environment variables:
2324

2425
<Note>For the CLI, the `--api-key` option here is part of the `weaviate-cloud` command. For Python, the `api_key` parameter here is part of the `CloudWeaviateAccessConfig` object.</Note>
2526

26-
- `WEAVIATE_COLLECTION` - The name of the target collection in the database, represented by `--collection` (CLI) or `collection` (Python).
27+
- `WEAVIATE_COLLECTION` - The name of the target collection in the database, represented by `--collection` (CLI) or `collection` (Python).
28+
If no value is provided, see the beginning of this article for the behavior at run time.

snippets/general-shared-text/weaviate-platform.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@ Fill in the following fields:
22

33
- **Name** (_required_): A unique name for the connector.
44
- **Cluster URL** (_required_): The URL of the Weaviate database cluster.
5-
- **Collection Name** (_required_): The name of the target collection within the cluster.
5+
- **Collection Name**: The name of the target collection within the cluster. If no value is provided, see the beginning of this article
6+
for the behavior at run time.
67
- **API Key** (_required_): The API key provided by Weaviate to access the cluster.

snippets/general-shared-text/weaviate.mdx

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,47 @@
1919
- A Weaviate database instance. The following information assumes that you have a Weaviate Cloud (WCD) account with a Weaviate database cluster in that account.
2020
[Create a WCD account](https://weaviate.io/developers/wcs/quickstart#create-a-wcd-account). [Create a database cluster](https://weaviate.io/developers/wcs/quickstart#create-a-weaviate-cluster). For other database options, [learn more](https://weaviate.io/developers/weaviate/installation).
2121
- The URL and API key for the database cluster. [Get the URL and API key](https://weaviate.io/developers/wcs/quickstart#explore-the-details-panel).
22-
- The name of the target collection in the database. [Create a collection](https://weaviate.io/developers/wcs/tools/collections-tool).
22+
- The name of the target collection in the database. [Create a collection](https://weaviate.io/developers/wcs/tools/collections-tool).
23+
24+
An existing collection is not required. At runtime, the collection behavior is as follows:
2325

24-
Weaviate requires the collection to have a data schema before you add data. At minimum, this schema must contain the `record_id` property, as follows:
26+
For the [Unstructured Platform](/platform/overview):
27+
28+
- If an existing collection name is specified, and Unstructured generates embeddings,
29+
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
30+
You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
31+
- If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. If Unstructured generates embeddings,
32+
the new collection's name will be `U<short-workflow-id>_<short-embedding-model-name>_<number-of-dimensions>`.
33+
If Unstructured does not generate embeddings, the new collection's name will be `U<short-workflow-id`.
34+
35+
For [Unstructured Ingest](/ingestion/overview):
36+
37+
- If an existing collection name is specified, and Unstructured generates embeddings,
38+
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
39+
You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
40+
- If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. The new collection's name will be `Elements`.
41+
42+
If Unstructured creates a new collection and generates embeddings, you will not see an embeddings property in tools such as the Weaviate Cloud
43+
**Collections** user interface. To view the generated embeddings, you can run a Weaviate GraphQL query such as the following. In this query, replace `<collection-name>` with
44+
the name of the new collection, and replace `<property-name>` with the name of each additional available property that
45+
you want to return results for, such as `text`, `type`, `element_id`, `record_id`, and so on. The embeddings will be
46+
returned in the `vector` property.
47+
48+
```text
49+
{
50+
Get {
51+
<collection-name> {
52+
_additional {
53+
vector
54+
}
55+
<property-name>
56+
<property-name>
57+
}
58+
}
59+
}
60+
```
61+
62+
Weaviate requires an existing collection to have a data schema before you add data. At minimum, this schema must contain the `record_id` property, as follows:
2563

2664
```json
2765
{

0 commit comments

Comments
 (0)