You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following video shows how to fulfill the minimum set of Milvus requirements, demonstrating Milvus on IBM watsonx.data:
1
+
- For the [Unstructured Platform](/platform/overview), only Milvus cloud-based instances (such as Zilliz Cloud, and Milvus on IBM watsonx.data) are supported.
2
+
- For [Unstructured Ingest](/ingestion/overview), Milvus local and cloud-based instances are supported.
3
+
4
+
The following video shows how to fulfill the minimum set of requirements for Milvus cloud-based instances, demonstrating Milvus on IBM watsonx.data:
- A [Milvus instance](https://milvus.io/docs/install-overview.md).
14
-
- The [URI](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Client/MilvusClient.md) of the instance.
15
-
- The [username and password, or token](https://milvus.io/docs/authenticate.md) to access the instance.
16
-
- The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
17
-
- The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database.
18
-
19
-
Milvus requires the target collection to have a defined schema before Unstructured can write to the collection. The minimum viable
20
-
schema for Unstructured contains only the fields `element_id`, `embeddings`, and `record_id`, as follows. This example code demonstrates the use of the
21
-
[Python SDK for Milvus](https://pypi.org/project/pymilvus/) to create a collection with this minimum viable schema,
22
-
targeting Milvus on IBM watsonx.data:
23
-
24
-
```python Python
25
-
import os
26
-
from pymilvus import (
27
-
connections,
28
-
FieldSchema,
29
-
DataType,
30
-
CollectionSchema,
31
-
Collection,
32
-
)
33
-
34
-
connections.connect(
35
-
alias="default",
36
-
host=os.getenv("MILVUS_GRPC_HOST"),
37
-
port=os.getenv("MILVUS_GRPC_PORT"),
38
-
user=os.getenv("MILVUS_USER"),
39
-
password=os.getenv("MILVUS_PASSWORD"),
40
-
secure=True
41
-
)
42
-
43
-
primary_key = FieldSchema(
44
-
name="element_id",
45
-
dtype=DataType.VARCHAR,
46
-
is_primary=True,
47
-
max_length=200
48
-
)
49
-
50
-
vector = FieldSchema(
51
-
name="embeddings",
52
-
dtype=DataType.FLOAT_VECTOR,
53
-
dim=3072
54
-
)
55
-
56
-
record_id = FieldSchema(
57
-
name="record_id",
58
-
dtype=DataType.VARCHAR,
59
-
max_length=200
60
-
)
61
-
62
-
schema = CollectionSchema(
63
-
fields=[primary_key, vector, record_id],
64
-
enable_dynamic_field=True
65
-
)
66
-
67
-
collection = Collection(
68
-
name="my_collection",
69
-
schema=schema,
70
-
using="default"
71
-
)
72
-
73
-
index_params = {
74
-
"metric_type": "L2",
75
-
"index_type": "IVF_FLAT",
76
-
"params": {"nlist": 1024}
77
-
}
78
-
79
-
collection.create_index(
80
-
field_name="embeddings",
81
-
index_params=index_params
82
-
)
83
-
```
84
-
85
-
Other approaches, such as [creating collections instantly](https://milvus.io/docs/create-collection-instantly.md) or
86
-
[setting nullable and default fields](https://milvus.io/docs/nullable-and-default.md), have not
87
-
been fully evaluated by Unstructured and might produce unexpected results.
88
-
89
-
Unstructured cannot provide a schema that is guaranteed to work in all
90
-
circumstances. This is because these schemas will vary based on your source files' types; how you
91
-
want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.
16
+
- For Zilliz Cloud, you will need:
17
+
18
+
- A [Zilliz Cloud account](https://cloud.zilliz.com/signup).
19
+
- A [Zilliz Cloud cluster](https://docs.zilliz.com/docs/create-cluster).
20
+
- The URI of the cluster, also known as the cluster's _public endpoint_, which takes a format such as
- An [IBM Cloud account](https://cloud.ibm.com/registration).
39
+
- The [IBM watsonx.data subscription plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started).
40
+
- A [Milvus service instance in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service).
41
+
- The URI of the instance, which takes the format of `https://`, followed by instance's **GRPC host**, followed by a colon and the **GRPC port**.
42
+
This takes the format of `https://<host>:<port>`.
43
+
[Get the instance's GRPC host and GRPC port](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-conn-to-milvus).
44
+
- The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
45
+
- The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.
46
+
- The uername and password to access the instance.
47
+
The username for Milvus on IBM watsonx.data is always `ibmlhapikey`.
48
+
The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key.
49
+
[Get the user API key](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).
50
+
51
+
- For Milvus local, you will need:
52
+
53
+
- A [Milvus instance](https://milvus.io/docs/install-overview.md).
54
+
- The [URI](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Client/MilvusClient.md) of the instance.
55
+
- The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
56
+
- The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database.
57
+
Note the collection requirements at the end of this section.
58
+
- The [username and password, or token](https://milvus.io/docs/authenticate.md) to access the instance.
59
+
60
+
All Milvus instances require the target collection to have a defined schema before Unstructured can write to the collection. The minimum viable
61
+
schema for Unstructured contains only the fields `element_id`, `embeddings`, and `record_id`, as follows. This example code demonstrates the use of the
62
+
[Python SDK for Milvus](https://pypi.org/project/pymilvus/) to create a collection with this minimum viable schema,
63
+
targeting Milvus on IBM watsonx.data. For the `connections.connect` arguments to connect to other types of Milvus deployments, see your Milvus provider's documentation:
64
+
65
+
```python Python
66
+
import os
67
+
from pymilvus import (
68
+
connections,
69
+
FieldSchema,
70
+
DataType,
71
+
CollectionSchema,
72
+
Collection,
73
+
)
74
+
75
+
connections.connect(
76
+
alias="default",
77
+
host=os.getenv("MILVUS_GRPC_HOST"),
78
+
port=os.getenv("MILVUS_GRPC_PORT"),
79
+
user=os.getenv("MILVUS_USER"),
80
+
password=os.getenv("MILVUS_PASSWORD"),
81
+
secure=True
82
+
)
83
+
84
+
primary_key = FieldSchema(
85
+
name="element_id",
86
+
dtype=DataType.VARCHAR,
87
+
is_primary=True,
88
+
max_length=200
89
+
)
90
+
91
+
vector = FieldSchema(
92
+
name="embeddings",
93
+
dtype=DataType.FLOAT_VECTOR,
94
+
dim=3072
95
+
)
96
+
97
+
record_id = FieldSchema(
98
+
name="record_id",
99
+
dtype=DataType.VARCHAR,
100
+
max_length=200
101
+
)
102
+
103
+
schema = CollectionSchema(
104
+
fields=[primary_key, vector, record_id],
105
+
enable_dynamic_field=True
106
+
)
107
+
108
+
collection = Collection(
109
+
name="my_collection",
110
+
schema=schema,
111
+
using="default"
112
+
)
113
+
114
+
index_params = {
115
+
"metric_type": "L2",
116
+
"index_type": "IVF_FLAT",
117
+
"params": {"nlist": 1024}
118
+
}
119
+
120
+
collection.create_index(
121
+
field_name="embeddings",
122
+
index_params=index_params
123
+
)
124
+
```
125
+
126
+
Other approaches, such as [creating collections instantly](https://milvus.io/docs/create-collection-instantly.md) or
127
+
[setting nullable and default fields](https://milvus.io/docs/nullable-and-default.md), have not
128
+
been fully evaluated by Unstructured and might produce unexpected results.
129
+
130
+
Unstructured cannot provide a schema that is guaranteed to work in all
131
+
circumstances. This is because these schemas will vary based on your source files' types; how you
132
+
want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.
0 commit comments