You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`<name>` (_required_) - A unique name for this connector.
2
2
-`<token>` (_required_) - The application token for the database.
3
-
-`<api-endpoint>` (_required_) - The database’s associated API endpoint.
4
-
-`<collection-name>` - The name of the collection in the namespace. If no value is provided, see the beginning of this article for the behavior at run time.
3
+
-`<api-endpoint>` (_required_) - The database's associated API endpoint.
4
+
-`<collection-name>` - The name of the collection in the keyspace. If no value is provided, see the beginning of this article for the behavior at run time.
5
5
-`<keyspace>` - The name of the keyspace in the collection. The default is `default_keyspace` if not otherwise specified.
6
6
-`<batch-size>` - The maximum number of records to send per batch. The default is `20` if not otherwise specified.
7
7
-`flatten_metadata` - Set to `true` to flatten the metadata into each record. Specifically, when flattened, the metadata key values are brought to the top level of the element, and the `metadata` key itself is removed. By default, the metadata is not flattened (`false`).
Copy file name to clipboardExpand all lines: snippets/general-shared-text/astradb-platform.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
Fill in the following fields:
2
2
3
3
-**Name** (_required_): A unique name for this connector.
4
-
-**Collection Name**: The name of the collection in the namespace. If no value is provided, see the beginning of this article for the behavior at run time.
4
+
-**Collection Name**: The name of the collection in the keyspace. If no value is provided, see the beginning of this article for the behavior at run time.
5
5
-**Keyspace** (_required_): The name of the keyspace in the collection.
6
6
-**Batch Size**: The maximum number of records per batch. The default is `20` if not otherwise specified.
7
7
-**Flatten Metadata**: Check this box to flatten the metadata into each record.
- An Astra account. [Create or sign in to an Astra account](https://astra.datastax.com/).
12
-
- A database in the Astra account. [Create a database in an account](https://docs.datastax.com/en/astra-db-classic/databases/manage-create.html).
13
-
- An application token for the database. [Create a database application token](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html).
14
-
- A namespace in the database. [Create a namespace in a database](https://docs.datastax.com/en/astra-db-serverless/databases/manage-namespaces.html#create-namespace).
15
-
- A collection in the namespace. [Create a collection in a namespace](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).
11
+
- An IBM Cloud account or DataStax account.
16
12
17
-
An existing collection is not required. At runtime, the collection behavior is as follows:
13
+
- For an IBM Cloud account, [sign up](https://cloud.ibm.com/registration) for an IBMid, and then [sign in](https://accounts.datastax.com/session-service/v1/login) to DataStax with your IBMid.
14
+
- For a DataStax account, [sign up](https://astra.datastax.com/signup) for a DataStax account, and then [sign in](https://accounts.datastax.com/session-service/v1/login) to DataStax with your DataStax account.
15
+
16
+
- An Astra DB database in the DataStax account. To create a database:
17
+
18
+
a. After you sign in to DataStax, click **Create database**.<br/>
19
+
b. Click the **Serverless (vector)** tile, if it is not already selected.<br/>
20
+
c. For **Database name**, enter some unique name for the database.<br/>
21
+
d. Select a **Provider** and a **Region**, and then click **Create database**.<br/>
For the [Unstructured UI](/ui/overview) and [Unstructured API](/api-reference/overview):
20
45
21
-
- If an existing collection name is specified, and Unstructured generates embeddings,
22
-
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
23
-
You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
24
-
- If a collection name is not specified, Unstructured creates a new collection in your namespace. If Unstructured generates embeddings,
25
-
the new collections's name will be `u<short-workflow-id>_<short-embedding-model-name>_<number-of-dimensions>`.
26
-
If Unstructured does not generate embeddings, the new collections's name will be `u<short-workflow-id`.
46
+
- An existing collection is not required. At runtime, the collection behavior is as follows:
47
+
48
+
- If an existing collection name is specified, and Unstructured generates embeddings,
49
+
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
50
+
You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
51
+
- If a collection name is not specified, Unstructured creates a new collection in your keyspace. If Unstructured generates embeddings,
52
+
the new collections's name will be `u<short-workflow-id>_<short-embedding-model-name>_<number-of-dimensions>`.
53
+
If Unstructured does not generate embeddings, the new collections's name will be `u<short-workflow-id`.
27
54
28
55
For [Unstructured Ingest](/open-source/ingestion/overview):
29
56
30
-
- If an existing collection name is specified, and Unstructured generates embeddings,
31
-
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
32
-
You must change your Unstructured embedding settings or your existing collections's embedding settings to match, and try the run again.
33
-
- If a collection name is not specified, Unstructured creates a new collection in your Pinecone account. The new collection's name will be `unstructuredautocreated`.
57
+
- For the source connector only, an existing collection is required.
58
+
- For the destination connector only, an existing collection is not required. At runtime, the collection behavior is as follows:
59
+
60
+
- If an existing collection name is specified, and Unstructured generates embeddings,
61
+
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
62
+
You must change your Unstructured embedding settings or your existing collections's embedding settings to match, and try the run again.
63
+
- If a collection name is not specified, Unstructured creates a new collection in your keyspace. The new collection's name will be `unstructuredautocreated`.
64
+
65
+
To create a collection yourself:
66
+
67
+
a. After you sign in to DataStax, in the list of databases, click the name of the target database.<br/>
68
+
b. On the **Data Explorer** tab, in the **Keyspace** list, select the name of the target keyspace.<br/>
69
+
c. In the **Collections** list, select **Create collection**.<br/>
70
+
d. Enter some **Collection name**.<br/>
71
+
e. Turn on **Vector-enabled collection**, if it is not already turned on.<br/>
72
+
f. For **Embedding generation method**, select **Bring my own**.<br/>
73
+
g. For **Dimensions**, enter the number of dimensions for the embedding model that you plan to use.<br/>
74
+
h. For **Similarity metric**, select **Cosine**.<br/>
Copy file name to clipboardExpand all lines: snippets/general-shared-text/milvus.mdx
+59-28Lines changed: 59 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,61 @@
1
-
- For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Milvus cloud-based instances (such as Zilliz Cloud, and Milvus on IBM watsonx.data) are supported.
1
+
- For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Milvus cloud-based instances (such as Milvus on IBM watsonx.data, or Zilliz Cloud) are supported.
2
2
- For [Unstructured Ingest](/open-source/ingestion/overview), Milvus local and cloud-based instances are supported.
3
3
4
-
The following video shows how to fulfill the minimum set of requirements for Milvus cloud-based instances, demonstrating Milvus on IBM watsonx.data:
- An [IBM Cloud account](https://cloud.ibm.com/registration).
17
+
- An IBM watsonx.data [Lite plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_prov_lite_1)
18
+
or [Enterprise plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started_1) within your IBM Cloud account.
19
+
20
+
- If you are provisoning a Lite plan, be sure to choose the **Generative AI** use case when prompted, as this is the only use case offered that includes Milvus.
21
+
22
+
- A [Milvus service instance in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service).
23
+
24
+
- If you are creating a Milvus service instance within a watsonx.data Lite plan, when you are prompted to choose a Milvus instance size, you can only select **Lite**. Because the Lite
25
+
Milvus instance size is recommended only for 384 dimensions, you should also use an embedding model that uses 384 dimensions only.
26
+
- If you are creating a Milvus service instance within a watsonx.data Enterprise plan, you can choose any available Milvus instance size. However, all Milvus instance sizes other than
27
+
**Custom** are recommended only for 384 dimensions, which means you should use an embedding model that uses 384 dimensions only.
28
+
The **Custom** Milvus instance size is recommended for any number of dimensions.
29
+
30
+
- The URI of the instance, which takes the format of `https://`, followed by instance's **GRPC host**, followed by a colon and the **GRPC port**.
31
+
This takes the format of `https://<host>:<port>`. To get this informatation, do the following:
32
+
33
+
a. Sign in to your IBM Cloud account.<br/>
34
+
b. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.<br/>
35
+
c. Expand **Databases**, and then click the name of the target **watsonx.data** plan.<br/>
36
+
d. Click **Open web console**.<br/>
37
+
e. On the sidebar, click **Infrastructure manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the title bar.<br/>
38
+
f. Click the target Milvus service instance.<br/>
39
+
g. On the **Details** tab, under **Type**, click **View connect details**.<br/>
40
+
h. Under **Service details**, expand **GRPC**, and note the value of **GRPC host** and **GRPC port**.<br/>
41
+
42
+
- The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
43
+
- The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.
44
+
- The username and password to access the instance.
45
+
46
+
- The username for Milvus on IBM watsonx.data is always `ibmlhapikey`.
47
+
- The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key. To create an IBM Cloud user API key:
48
+
49
+
a. Sign in to your IBM Cloud account.<br/>
50
+
b. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.<br/>
51
+
c. On the sidebar, under **Manage identities**, click **API keys**. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.<br/>
52
+
d. Click **Create**.<br/>
53
+
e. Enter some **Name** for the API key.<br/>
54
+
f. Optionally, enter some **Description** for the API key.<br/>
55
+
g. For **Leaked action**, leave **Disable the leaked key** selected.<br/>
56
+
h. For **Session management**, leave **No** selected.<br/>
57
+
i. Click **Create**.<br/>
58
+
j. Click **Download** (or **Copy**), and then download the API key to a secure location (or paste the copied API key into a secure location). You won't be able to access this API key from this dialog again. If you lose this API key, you can create a new one (and you should then delete the old one).<br/>
5
59
6
60
- For Zilliz Cloud, you will need:
7
61
@@ -54,31 +108,6 @@ The following video shows how to fulfill the minimum set of requirements for Mil
54
108
The number of dimensions for the `embeddings` field must match the number of dimensions for the embedding model that you plan to use.
- An [IBM Cloud account](https://cloud.ibm.com/registration).
70
-
- The [IBM watsonx.data subscription plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started).
71
-
- A [Milvus service instance in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service).
72
-
- The URI of the instance, which takes the format of `https://`, followed by instance's **GRPC host**, followed by a colon and the **GRPC port**.
73
-
This takes the format of `https://<host>:<port>`.
74
-
[Get the instance's GRPC host and GRPC port](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-conn-to-milvus).
75
-
- The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
76
-
- The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.
77
-
- The username and password to access the instance.
78
-
The username for Milvus on IBM watsonx.data is always `ibmlhapikey`.
79
-
The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key.
80
-
[Get the user API key](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).
81
-
82
111
- For Milvus local, you will need:
83
112
84
113
- A [Milvus instance](https://milvus.io/docs/install-overview.md).
@@ -89,7 +118,9 @@ The following video shows how to fulfill the minimum set of requirements for Mil
89
118
- The [username and password, or token](https://milvus.io/docs/authenticate.md) to access the instance.
90
119
91
120
All Milvus instances require the target collection to have a defined schema before Unstructured can write to the collection. The minimum viable
92
-
schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows. This example code demonstrates the use of the
121
+
schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows.
122
+
123
+
This example code demonstrates the use of the
93
124
[Python SDK for Milvus](https://pypi.org/project/pymilvus/) to create a collection with this schema,
94
125
targeting Milvus on IBM watsonx.data. For the `MilvusClient` arguments to connect to other types of Milvus deployments, see your Milvus provider's documentation:
0 commit comments