Skip to content

Commit 7c6fc28

Browse files
authored
docs: Add documentation for Milvus/Zilliz database integration (#1203)
This integration is similar to Pinecone/Qdrant. I'm writing a tutorial on how to use Apify with Milvus/Zilliz, and it would be good to reference our documentation in the tutorial. Just a note: Milvus is an open-source vector database, and Zilliz offers a managed solution based on Milvus. @TC-MO, could you please review the English? I haven't included any screenshots, as I believe the description should suffice to get started. Also, maintain and update screenshot is a bit painful. Additionally, I haven't added this integration to the [cards](https://docs.apify.com/platform/integrations#data-pipelines-etls-and-aillm-tools). Do you think it should be included there?
1 parent d404b12 commit 7c6fc28

File tree

3 files changed

+141
-0
lines changed

3 files changed

+141
-0
lines changed
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
title: Milvus integration
3+
description: Learn how to integrate Apify with Milvus (Zilliz) to save data scraped from the websites into the Milvus vector database.
4+
sidebar_label: Milvus
5+
sidebar_position: 4
6+
slug: /integrations/milvus
7+
toc_min_heading_level: 2
8+
toc_max_heading_level: 4
9+
---
10+
11+
**Learn how to integrate Apify with Milvus (Zilliz) to save data scraped from websites into the Milvus vector database.**
12+
13+
---
14+
15+
[Milvus](https://milvus.io/) is an open-source vector database optimized for performing similarity searches on large datasets of high-dimensional vectors.
16+
Its focus on efficient vector similarity search allows for the creation of powerful and scalable retrieval systems.
17+
18+
The Apify integration for Milvus allows exporting results from Apify Actors and Dataset items into a Milvus collection.
19+
It can also be connected to a managed Milvus instance on [Zilliz Cloud](https://cloud.zilliz.com).
20+
21+
## Prerequisites
22+
23+
Before you begin, ensure that you have the following:
24+
25+
- A Milvus database URL and API token. Optionally, you can use a username and password. You can run Milvus on Docker or Kubernetes, but in this example, we'll use the hosted Milvus service at [Zilliz Cloud](https://cloud.zilliz.com).
26+
- An [OpenAI API key](https://openai.com/index/openai-api/) to compute text embeddings.
27+
- An [Apify API token](https://docs.apify.com/platform/integrations/api#api-token) to access [Apify Actors](https://apify.com/store).
28+
29+
### How to set up Milvus database
30+
31+
1. Sign up or log in to your Zilliz account and create a new cluster.
32+
33+
1. Download the created credentials: user name and password.
34+
35+
Once the cluster is ready and you have the URL, API key, and credentials, you can set up the integration with Apify.
36+
37+
38+
### Integration Methods
39+
40+
You can integrate Apify with Milvus using either the Apify Console or the Apify Python SDK.
41+
42+
:::note Website Content Crawler usage
43+
44+
These examples use the Website Content Crawler Actor, which performs deep website crawling, cleans HTML by removing modals and navigation elements, and converts the content into Markdown.
45+
46+
:::
47+
48+
#### Apify Console
49+
50+
1. Set up the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor in the [Apify Console](https://console.apify.com). Refer to this guide on how to set up [website content crawl for your project](https://blog.apify.com/talk-to-your-website-with-large-language-models/).
51+
52+
1. After setting up the crawler, go to the **integration** section, select **Connect Actor or Task**, and search for the Milvus integration.
53+
54+
1. Select when to trigger this integration (typically when a run succeeds) and fill in all the required fields. If you haven't created a collection, it will be created automatically. You can learn more about the input parameters at the [Milvus integration input schema](https://apify.com/apify/milvus-integration/input-schema).
55+
56+
- For a detailed explanation of the input parameters, including dataset settings, incremental updates, and examples, see the [Milvus integration description](https://apify.com/apify/milvus-integration).
57+
58+
- For an explanation on how to combine Actors to accomplish more complex tasks, refer to the guide on [Actor-to-Actor](https://blog.apify.com/connecting-scrapers-apify-integration/) integrations.
59+
60+
#### Python
61+
62+
Another way to interact with Milvus is through the [Apify Python SDK](https://docs.apify.com/sdk/python/).
63+
64+
1. Install the Apify Python SDK by running the following command:
65+
66+
```py
67+
pip install apify-client
68+
```
69+
70+
1. Create a Python script and import all the necessary modules:
71+
72+
```python
73+
from apify_client import ApifyClient
74+
75+
APIFY_API_TOKEN = "YOUR-APIFY-TOKEN"
76+
OPENAI_API_KEY = "YOUR-OPENAI-API-KEY"
77+
78+
MILVUS_COLLECTION_NAME = "YOUR-MILVUS-COLLECTION-NAME"
79+
MILVUS_URL = "YOUR-MILVUS-URL"
80+
MILVUS_API_KEY = "YOUR-MILVUS-API-KEY"
81+
MILVUS_USER = "YOUR-MILVUS-USER"
82+
MILVUS_PASSWORD = "YOUR-MILVUS-PASSWORD"
83+
84+
client = ApifyClient(APIFY_API_TOKEN)
85+
```
86+
87+
1. Call the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor to crawl the Milvus documentation and Zilliz website and extract text content from the web pages:
88+
89+
```python
90+
actor_call = client.actor("apify/website-content-crawler").call(
91+
run_input={"maxCrawlPages": 10, "startUrls": [{"url": "https://milvus.io/"}, {"url": "https://zilliz.com/"}]}
92+
)
93+
```
94+
95+
1. Call Apify's Milvus integration and store all data in the Milvus Vector Database:
96+
97+
```python
98+
milvus_integration_inputs = {
99+
"milvusUrl": MILVUS_URL,
100+
"milvusApiKey": MILVUS_API_KEY,
101+
"milvusCollectionName": MILVUS_COLLECTION_NAME,
102+
"milvusUser": MILVUS_USER,
103+
"milvusPassword": MILVUS_PASSWORD,
104+
"datasetFields": ["text"],
105+
"datasetId": actor_call["defaultDatasetId"],
106+
"deltaUpdatesPrimaryDatasetFields": ["url"],
107+
"expiredObjectDeletionPeriodDays": 30,
108+
"embeddingsApiKey": OPENAI_API_KEY,
109+
"embeddingsProvider": "OpenAI",
110+
}
111+
actor_call = client.actor("apify/milvus-integration").call(run_input=milvus_integration_inputs)
112+
113+
```
114+
115+
Congratulations! You've successfully integrated Apify with Milvus, and the scraped data is now stored in your Milvus database.
116+
117+
## Additional Resources
118+
119+
- [Apify Milvus Integration](https://apify.com/apify/milvus-integration)
120+
- [Milvus documentation](https://milvus.io/docs)

sources/platform/integrations/index.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,12 @@ If you are working on an AI/LLM-related project, we recommend you look into the
179179
imageUrl="/img/platform/integrations/qdrant.svg"
180180
smallImage
181181
/>
182+
<Card
183+
title="Milvus"
184+
to="./integrations/milvus"
185+
imageUrl="/img/platform/integrations/milvus.svg"
186+
smallImage
187+
/>
182188
</CardGrid>
183189

184190
## Other Actors
Lines changed: 15 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)