|
| 1 | +--- |
| 2 | +title: Milvus integration |
| 3 | +description: Learn how to integrate Apify with Milvus (Zilliz) to save data scraped from the websites into the Milvus vector database. |
| 4 | +sidebar_label: Milvus |
| 5 | +sidebar_position: 4 |
| 6 | +slug: /integrations/milvus |
| 7 | +toc_min_heading_level: 2 |
| 8 | +toc_max_heading_level: 4 |
| 9 | +--- |
| 10 | + |
| 11 | +**Learn how to integrate Apify with Milvus (Zilliz) to save data scraped from websites into the Milvus vector database.** |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +[Milvus](https://milvus.io/) is an open-source vector database optimized for performing similarity searches on large datasets of high-dimensional vectors. |
| 16 | +Its focus on efficient vector similarity search allows for the creation of powerful and scalable retrieval systems. |
| 17 | + |
| 18 | +The Apify integration for Milvus allows exporting results from Apify Actors and Dataset items into a Milvus collection. |
| 19 | +It can also be connected to a managed Milvus instance on [Zilliz Cloud](https://cloud.zilliz.com). |
| 20 | + |
| 21 | +## Prerequisites |
| 22 | + |
| 23 | +Before you begin, ensure that you have the following: |
| 24 | + |
| 25 | +- A Milvus database URL and API token. Optionally, you can use a username and password. You can run Milvus on Docker or Kubernetes, but in this example, we'll use the hosted Milvus service at [Zilliz Cloud](https://cloud.zilliz.com). |
| 26 | +- An [OpenAI API key](https://openai.com/index/openai-api/) to compute text embeddings. |
| 27 | +- An [Apify API token](https://docs.apify.com/platform/integrations/api#api-token) to access [Apify Actors](https://apify.com/store). |
| 28 | + |
| 29 | +### How to set up Milvus database |
| 30 | + |
| 31 | +1. Sign up or log in to your Zilliz account and create a new cluster. |
| 32 | + |
| 33 | +1. Download the created credentials: user name and password. |
| 34 | + |
| 35 | +Once the cluster is ready and you have the URL, API key, and credentials, you can set up the integration with Apify. |
| 36 | + |
| 37 | + |
| 38 | +### Integration Methods |
| 39 | + |
| 40 | +You can integrate Apify with Milvus using either the Apify Console or the Apify Python SDK. |
| 41 | + |
| 42 | +:::note Website Content Crawler usage |
| 43 | + |
| 44 | +These examples use the Website Content Crawler Actor, which performs deep website crawling, cleans HTML by removing modals and navigation elements, and converts the content into Markdown. |
| 45 | + |
| 46 | +::: |
| 47 | + |
| 48 | +#### Apify Console |
| 49 | + |
| 50 | +1. Set up the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor in the [Apify Console](https://console.apify.com). Refer to this guide on how to set up [website content crawl for your project](https://blog.apify.com/talk-to-your-website-with-large-language-models/). |
| 51 | + |
| 52 | +1. After setting up the crawler, go to the **integration** section, select **Connect Actor or Task**, and search for the Milvus integration. |
| 53 | + |
| 54 | +1. Select when to trigger this integration (typically when a run succeeds) and fill in all the required fields. If you haven't created a collection, it will be created automatically. You can learn more about the input parameters at the [Milvus integration input schema](https://apify.com/apify/milvus-integration/input-schema). |
| 55 | + |
| 56 | +- For a detailed explanation of the input parameters, including dataset settings, incremental updates, and examples, see the [Milvus integration description](https://apify.com/apify/milvus-integration). |
| 57 | + |
| 58 | +- For an explanation on how to combine Actors to accomplish more complex tasks, refer to the guide on [Actor-to-Actor](https://blog.apify.com/connecting-scrapers-apify-integration/) integrations. |
| 59 | + |
| 60 | +#### Python |
| 61 | + |
| 62 | +Another way to interact with Milvus is through the [Apify Python SDK](https://docs.apify.com/sdk/python/). |
| 63 | + |
| 64 | +1. Install the Apify Python SDK by running the following command: |
| 65 | + |
| 66 | + ```py |
| 67 | + pip install apify-client |
| 68 | + ``` |
| 69 | + |
| 70 | +1. Create a Python script and import all the necessary modules: |
| 71 | + |
| 72 | + ```python |
| 73 | + from apify_client import ApifyClient |
| 74 | + |
| 75 | + APIFY_API_TOKEN = "YOUR-APIFY-TOKEN" |
| 76 | + OPENAI_API_KEY = "YOUR-OPENAI-API-KEY" |
| 77 | + |
| 78 | + MILVUS_COLLECTION_NAME = "YOUR-MILVUS-COLLECTION-NAME" |
| 79 | + MILVUS_URL = "YOUR-MILVUS-URL" |
| 80 | + MILVUS_API_KEY = "YOUR-MILVUS-API-KEY" |
| 81 | + MILVUS_USER = "YOUR-MILVUS-USER" |
| 82 | + MILVUS_PASSWORD = "YOUR-MILVUS-PASSWORD" |
| 83 | + |
| 84 | + client = ApifyClient(APIFY_API_TOKEN) |
| 85 | + ``` |
| 86 | + |
| 87 | +1. Call the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor to crawl the Milvus documentation and Zilliz website and extract text content from the web pages: |
| 88 | + |
| 89 | + ```python |
| 90 | + actor_call = client.actor("apify/website-content-crawler").call( |
| 91 | + run_input={"maxCrawlPages": 10, "startUrls": [{"url": "https://milvus.io/"}, {"url": "https://zilliz.com/"}]} |
| 92 | + ) |
| 93 | + ``` |
| 94 | + |
| 95 | +1. Call Apify's Milvus integration and store all data in the Milvus Vector Database: |
| 96 | + |
| 97 | + ```python |
| 98 | + milvus_integration_inputs = { |
| 99 | + "milvusUrl": MILVUS_URL, |
| 100 | + "milvusApiKey": MILVUS_API_KEY, |
| 101 | + "milvusCollectionName": MILVUS_COLLECTION_NAME, |
| 102 | + "milvusUser": MILVUS_USER, |
| 103 | + "milvusPassword": MILVUS_PASSWORD, |
| 104 | + "datasetFields": ["text"], |
| 105 | + "datasetId": actor_call["defaultDatasetId"], |
| 106 | + "deltaUpdatesPrimaryDatasetFields": ["url"], |
| 107 | + "expiredObjectDeletionPeriodDays": 30, |
| 108 | + "embeddingsApiKey": OPENAI_API_KEY, |
| 109 | + "embeddingsProvider": "OpenAI", |
| 110 | + } |
| 111 | + actor_call = client.actor("apify/milvus-integration").call(run_input=milvus_integration_inputs) |
| 112 | + |
| 113 | + ``` |
| 114 | + |
| 115 | +Congratulations! You've successfully integrated Apify with Milvus, and the scraped data is now stored in your Milvus database. |
| 116 | + |
| 117 | +## Additional Resources |
| 118 | + |
| 119 | +- [Apify Milvus Integration](https://apify.com/apify/milvus-integration) |
| 120 | +- [Milvus documentation](https://milvus.io/docs) |
0 commit comments