Skip to content

Commit e4359d6

Browse files
authored
Updates to Milvus and IBM watsonx.data connectors (#748)
1 parent 2d718c2 commit e4359d6

File tree

2 files changed

+60
-35
lines changed

2 files changed

+60
-35
lines changed

snippets/general-shared-text/ibm-watsonxdata.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@
122122
a. Check the box labelled **Associate Catalog**.<br/>
123123
b. For **Catalog type**, select **Apache Iceberg**.<br/>
124124
c. Enter some **Catalog name**.<br/>
125-
b. Click **Associate**.<br/>
125+
d. Click **Associate**.<br/>
126126

127127
If you select **Register my own**, you must provide the following settings:
128128

@@ -137,7 +137,7 @@
137137
a. Check the box labelled **Associate Catalog**.<br/>
138138
b. For **Catalog type**, select **Apache Iceberg**.<br/>
139139
c. Enter some **Catalog name**.<br/>
140-
b. Click **Associate**.<br/>
140+
d. Click **Associate**.<br/>
141141

142142
10. On the sidebar, click **Infrastructure manager**. Make sure the catalog is associated with the appropriate engines. If it is not, rest your mouse
143143
on an unassociated target engine, click the **Manage associations** icon, check the box next to the target catalog's name, and then

snippets/general-shared-text/milvus.mdx

Lines changed: 58 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -37,18 +37,23 @@ The following video shows how to fulfill the minimum set of requirements for Mil
3737
- The name of the [database](https://docs.zilliz.com/docs/database#create-database) in the instance.
3838
- The name of the [collection](https://docs.zilliz.com/docs/manage-collections-console#create-collection) in the database.
3939

40-
The collection must have a a defined schema before Unstructured can write to the collection. The minimum viable
41-
schema for Unstructured contains only the fields `element_id`, `embeddings`, and `record_id`, as follows:
40+
The collection must have a defined schema before Unstructured can write to the collection. The minimum viable
41+
schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows:
4242

4343
| Field Name | Field Type | Max Length | Dimension |
4444
|---|---|---|---|
4545
| `element_id` (primary key field) | **VARCHAR** | `200` | -- |
46-
| `embeddings` (vector field) | **FLOAT_VECTOR** | -- | `3072` |
46+
| `embeddings` (vector field) | **FLOAT_VECTOR** | -- | `384` |
4747
| `record_id` | **VARCHAR** | `200` | -- |
48+
| `text` | **VARCHAR** | `65536` | -- |
4849

4950
In the **Create Index** area for the collection, next to **Vector Fields**, click **Edit Index**. Make sure that for the
5051
`embeddings` field, the **Field Type** is set to **FLOAT_VECTOR** and the **Metric Type** is set to **Cosine**.
5152

53+
<Warning>
54+
The number of dimensions for the `embeddings` field must match the number of dimensions for the embedding model that you plan to use.
55+
</Warning>
56+
5257
- For Milvus on IBM watsonx.data, you will need:
5358

5459
<iframe
@@ -69,7 +74,7 @@ The following video shows how to fulfill the minimum set of requirements for Mil
6974
[Get the instance's GRPC host and GRPC port](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-conn-to-milvus).
7075
- The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
7176
- The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.
72-
- The uername and password to access the instance.
77+
- The username and password to access the instance.
7378
The username for Milvus on IBM watsonx.data is always `ibmlhapikey`.
7479
The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key.
7580
[Get the user API key](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).
@@ -84,75 +89,95 @@ The following video shows how to fulfill the minimum set of requirements for Mil
8489
- The [username and password, or token](https://milvus.io/docs/authenticate.md) to access the instance.
8590

8691
All Milvus instances require the target collection to have a defined schema before Unstructured can write to the collection. The minimum viable
87-
schema for Unstructured contains only the fields `element_id`, `embeddings`, and `record_id`, as follows. Adding a `text` field is optional but highly recommended.This example code demonstrates the use of the
92+
schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows. This example code demonstrates the use of the
8893
[Python SDK for Milvus](https://pypi.org/project/pymilvus/) to create a collection with this schema,
89-
targeting Milvus on IBM watsonx.data. For the `connections.connect` arguments to connect to other types of Milvus deployments, see your Milvus provider's documentation:
94+
targeting Milvus on IBM watsonx.data. For the `MilvusClient` arguments to connect to other types of Milvus deployments, see your Milvus provider's documentation:
9095

9196
```python Python
9297
import os
98+
9399
from pymilvus import (
94-
connections,
100+
MilvusClient,
95101
FieldSchema,
96102
DataType,
97-
CollectionSchema,
98-
Collection,
103+
CollectionSchema
99104
)
100105

101-
connections.connect(
102-
alias="default",
103-
host=os.getenv("MILVUS_GRPC_HOST"),
104-
port=os.getenv("MILVUS_GRPC_PORT"),
105-
user=os.getenv("MILVUS_USER"),
106-
password=os.getenv("MILVUS_PASSWORD"),
107-
secure=True
106+
DATABASE_NAME = "default"
107+
COLLECTION_NAME = "my_collection"
108+
109+
client = MilvusClient(
110+
uri="https://" +
111+
os.getenv("MILVUS_USER") +
112+
":" +
113+
os.getenv("MILVUS_PASSWORD") +
114+
"@" +
115+
os.getenv("MILVUS_GRPC_HOST") +
116+
":" +
117+
os.getenv("MILVUS_GRPC_PORT"),
118+
db_name=DATABASE_NAME
108119
)
109120

110-
primary_key = FieldSchema(
121+
primary_key_field = FieldSchema(
111122
name="element_id",
112123
dtype=DataType.VARCHAR,
113124
is_primary=True,
114125
max_length=200
115126
)
116127

117-
vector = FieldSchema(
128+
# IMPORTANT: The number of dimensions for the "embeddings" field
129+
# must match the number of dimensions for the embedding model
130+
# that you plan to use.
131+
embeddings_field = FieldSchema(
118132
name="embeddings",
119133
dtype=DataType.FLOAT_VECTOR,
120-
dim=3072
134+
dim=384
121135
)
122136

123-
record_id = FieldSchema(
137+
record_id_field = FieldSchema(
124138
name="record_id",
125139
dtype=DataType.VARCHAR,
126140
max_length=200
127141
)
128142

129-
text = FieldSchema(
143+
text_field = FieldSchema(
130144
name="text",
131145
dtype=DataType.VARCHAR,
132-
max_length=65536
146+
max_length=65535
133147
)
134148

135149
schema = CollectionSchema(
136-
fields=[primary_key, vector, record_id, text],
137-
enable_dynamic_field=True
150+
fields=[
151+
primary_key_field,
152+
embeddings_field,
153+
record_id_field,
154+
text_field
155+
]
138156
)
139157

140-
collection = Collection(
141-
name="my_collection",
158+
client.create_collection(
159+
collection_name=COLLECTION_NAME",
142160
schema=schema,
143-
using="default"
161+
using=DATABASE_NAME
144162
)
145163

146-
index_params = {
147-
"metric_type": "L2",
148-
"index_type": "IVF_FLAT",
149-
"params": {"nlist": 1024}
150-
}
164+
index_params = client.prepare_index_params()
151165

152-
collection.create_index(
166+
index_params.add_index(
153167
field_name="embeddings",
168+
metric_type="COSINE",
169+
index_type="IVF_FLAT",
170+
params={"nlist": 1024}
171+
)
172+
173+
client.create_index(
174+
collection_name=COLLECTION_NAME,
154175
index_params=index_params
155176
)
177+
178+
client.load_collection(
179+
collection_name=COLLECTION_NAME
180+
)
156181
```
157182

158183
Other approaches, such as [creating collections instantly](https://milvus.io/docs/create-collection-instantly.md) or

0 commit comments

Comments
 (0)