Skip to content

Commit 4a6006e

Browse files
authored
Merge pull request #394 from Bennu-Li/recserving
Recserving
2 parents fec3bde + cf2b6d6 commit 4a6006e

File tree

26 files changed

+432
-132
lines changed

26 files changed

+432
-132
lines changed

recserving/milvus_tool/README.md

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# README
2+
3+
Milvus 是一款开源的特征向量相似度搜索引擎。本工具是基于 Milvus 实现的提供向量存储与召回的服务。你可以将本工具用在推荐系统中的召回这一过程。
4+
5+
Milvus 教程请参考官网:https://milvus.io/cn/
6+
7+
Milvus 源码详情参考:https://github.com/milvus-io/milvus
8+
9+
## 目录结
10+
11+
以下是本工具的简要目录结构及说明:
12+
13+
```
14+
├── readme.md #介绍文档
15+
├── config.py #参数配置
16+
├── milvus_insert.py #向量插入脚本
17+
├── milvus_recall.py #向量召回脚本
18+
```
19+
20+
## 环境要求
21+
22+
**操作系统**
23+
24+
CentOS: 7.5 或以上
25+
26+
Ubuntu LTS: 18.04 或以上
27+
28+
**硬件**
29+
30+
cpu: Intel CPU Sandy Bridge 或以上
31+
32+
> 要求 CPU 支持以下至少一个指令集: SSE42, AVX, AVX2, AVX512
33+
34+
内存: 8GB 或以上 (取决于具体向量数据规模)
35+
36+
**软件**
37+
38+
Python 版本: 3.6 及以上
39+
40+
Docker: 19.03 或以上
41+
42+
Milvus 1.0.0
43+
44+
45+
46+
## 安装启动 Milvus
47+
48+
这里将安装 [Milvus1.0.0 的 CPU 版本](https://milvus.io/cn/docs/v1.0.0/milvus_docker-cpu.md),也可以选择安装 GPU 版本的 Milvus,安装方式请参考: [Milvus1.0.0 GPU 安装](https://milvus.io/cn/docs/v1.0.0/milvus_docker-gpu.md)
49+
50+
**拉取 CPU 版本的 Milvus 镜像:**
51+
52+
```shell
53+
$ sudo docker pull milvusdb/milvus:1.0.0-cpu-d030521-1ea92e
54+
```
55+
56+
**下载配置文件**
57+
58+
```shell
59+
$ mkdir -p /home/$USER/milvus/conf
60+
$ cd /home/$USER/milvus/conf
61+
$ wget https://raw.githubusercontent.com/milvus-io/milvus/v1.0.0/core/conf/demo/server_config.yaml
62+
```
63+
64+
> Milvus 相关的配置可以通过该配置文件指定。
65+
66+
**启动 Milvus Docker 容器**
67+
68+
```shell
69+
$ sudo docker run -d --name milvus_cpu_1.0.0 \
70+
-p 19530:19530 \
71+
-p 19121:19121 \
72+
-v /home/$USER/milvus/db:/var/lib/milvus/db \
73+
-v /home/$USER/milvus/conf:/var/lib/milvus/conf \
74+
-v /home/$USER/milvus/logs:/var/lib/milvus/logs \
75+
-v /home/$USER/milvus/wal:/var/lib/milvus/wal \
76+
milvusdb/milvus:1.0.0-cpu-d030521-1ea92e
77+
```
78+
79+
**确认 Milvus 运行状态**
80+
81+
```shell
82+
$ sudo docker logs milvus_cpu_1.0.0
83+
```
84+
85+
> 用以上命令查看 Milvus 服务是否正常启动。
86+
87+
**安装 Milvus Python SDK**
88+
89+
```shell
90+
$ pip install pymilvus== 1.0.1
91+
```
92+
93+
94+
95+
## 使用说明
96+
97+
本工具中的脚本提供向量插入和向量召回两个功能。在使用该工具的脚本前,需要先根据环境修改该工具中的配置文件 `config.py`
98+
99+
| Parameters | Description | Reference value |
100+
| ---------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
101+
| MILVUS_HOST | Milvus 服务所在的机器 IP | 127.0.0.1 |
102+
| MILVUS_PORT | 提供 Milvus 服务的端口 | 19530 |
103+
| collection_param | 在 Milvus 中建立的集合参数。<br />`dimension` 表示向量维度<br />`index_file_size` 表示在 Milvus 中存储的数据文件大小<br />`metric_type` 表示计算向量相似度的方式 | collection_param = {<br /> 'dimension': 128,<br /> 'index_file_size': 2048,<br /> 'metric_type': MetricType.L2} |
104+
| index_type | 指定给 Milvus 集合建立的索引类型 | IndexType.IVF_FLAT |
105+
| index_param | 建立索引的参数,不同索引所需要的参数不同 | {'nlist': 1000} |
106+
| top_k | 查询时,召回的向量数。 | 100 |
107+
| search_param | 在 Milvus 中查询时的参数,该参数会影像查询性能和召回率 | {'nprobe': 20} |
108+
109+
### 向量导入
110+
111+
Milvus_insert.py 脚本提供向量导入功能,在使用该脚本前,需要在config.py 修改对应参数。调用方式如下:
112+
113+
```python
114+
from milvus_tool.milvus_insert import VecToMilvus
115+
116+
client = VecToMilvus()
117+
status, ids = client.insert(collection_name=collection_name, vectors=embeddings, ids=ids, partition_tag=partition_name)
118+
```
119+
120+
> 调用 insert 方法时需要传入的参数:
121+
>
122+
> **collection_name**: 将向量插入 Milvus 中的集合的名称。该脚本在导入数据前,会检查库中是否存在该集合,不存在的话会按照 `config.py` 中设置的集合参数建立一个新的集合。
123+
>
124+
> **vectors**: 插入 Milvus 集合中的向量。这里要求的是向量格式是二维列表的形式,示例:[[2.1, 3.2, 10.3, 5.5], [3.3, 4.2, 6.5, 6.3]] ,这里表示插入两条维度为四的向量。
125+
>
126+
> **ids**: 和向量一一对应的 ID,这里要求的 ids 是一维列表的形式,示例:[1,2],这里表示上述两条向量对应的 ID 分别是 1 和 2. 这里的 ids 也可以为空,不传入参数,此时插入的向量将由 Milvus 自动分配 ID。
127+
>
128+
> **partition_tag**: 指定向量要插入的分区名称,Milvus 中可以通过标签将一集合分割为若干个分区 。该参数可以为空,为空时向量直接插入集合中。
129+
130+
**返回结果**:向量导入后将返回 `status``ids` 两个参数。status 返回的是插入的状态,插入成功或者失败。ids 返回的是插入向量对应的 ID,是一个一维列表。
131+
132+
具体使用可参考项目 movie_recommender/to_milvus.py
133+
134+
### 向量召回
135+
136+
milvus_recall.py 提供向量召回功能,在使用该脚本前,需要在config.py 修改对应参数,调用方式如下:
137+
138+
```python
139+
from milvus_tool.milvus_recall import RecallByMilvus
140+
milvus_client = RecallByMilvus()
141+
status, results = self.milvus_client.search(collection_name=collection_name, vectors = query_records, partition_name=partition_name)
142+
```
143+
144+
> **collection_name**:指定要查询的集合名称。
145+
>
146+
> **vectors**:指定要查询的向量。该向量格式和插入时的向量格式一样,是一个二维列表。
147+
>
148+
> **partition_tag**:指定查询的分区标签。该参数可以为空,不指定时在 collection 的全局范围内查找。
149+
150+
**返回结果**:查询后将返回 `status``results` 两个结果。`status` 返回的是查询的状态,查询成功或者失败。`results` 返回的是查询结果,返回结果的格式示例:
151+
152+
```
153+
以下查询两条向量,top_k =3 是的结果示例
154+
[
155+
[ (id:000, distance:0.0),
156+
(id:020, distance:0.17),
157+
(id:111, distance:0.60) ]
158+
[ (id:100, distance:0.0),
159+
(id:032, distance:0.69),
160+
(id:244, distance:1.051) ]
161+
]
162+
```
163+
164+
具体使用可参考项目 movie_recommender/recall.py

recserving/milvus_tool/config.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import os
2+
from milvus import MetricType, IndexType
3+
4+
MILVUS_HOST = '127.0.0.1'
5+
MILVUS_PORT = 19530
6+
7+
collection_param = {
8+
'dimension': 128,
9+
'index_file_size': 2048,
10+
'metric_type': MetricType.L2
11+
}
12+
13+
index_type = IndexType.IVF_FLAT
14+
index_param = {'nlist': 1000}
15+
16+
top_k = 100
17+
search_param = {'nprobe': 20}
18+
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
from milvus import *
2+
3+
# from milvus_tool.config import MILVUS_HOST, MILVUS_PORT, collection_param, index_type, index_param
4+
from config import MILVUS_HOST, MILVUS_PORT, collection_param, index_type, index_param
5+
6+
7+
class VecToMilvus():
8+
def __init__(self):
9+
self.client = Milvus(host=MILVUS_HOST, port=MILVUS_PORT)
10+
11+
def has_collection(self, collection_name):
12+
try:
13+
status, ok = self.client.has_collection(collection_name)
14+
return ok
15+
except Exception as e:
16+
print("Milvus has_table error:", e)
17+
18+
def creat_collection(self, collection_name):
19+
try:
20+
collection_param['collection_name'] = collection_name
21+
status = self.client.create_collection(collection_param)
22+
print(status)
23+
return status
24+
except Exception as e:
25+
print("Milvus create collection error:", e)
26+
27+
def create_index(self, collection_name):
28+
try:
29+
status = self.client.create_index(collection_name, index_type, index_param)
30+
print(status)
31+
return status
32+
except Exception as e:
33+
print("Milvus create index error:", e)
34+
35+
def has_partition(self, collection_name, partition_tag):
36+
try:
37+
status, ok = self.client.has_partition(collection_name, partition_tag)
38+
return ok
39+
except Exception as e:
40+
print("Milvus has partition error: ", e)
41+
42+
def create_partition(self, collection_name, partition_tag):
43+
try:
44+
status = self.client.create_partition(collection_name, partition_tag)
45+
print('create partition {} successfully'.format(partition_tag))
46+
return status
47+
except Exception as e:
48+
print('Milvus create partition error: ', e)
49+
50+
def insert(self, vectors, collection_name, ids=None, partition_tag=None):
51+
try:
52+
if not self.has_collection(collection_name):
53+
self.creat_collection(collection_name)
54+
self.create_index(collection_name)
55+
print('collection info: {}'.format(self.client.get_collection_info(collection_name)[1]))
56+
if (partition_tag is not None) and (not self.has_partition(collection_name, partition_tag)):
57+
self.create_partition(collection_name, partition_tag)
58+
status, ids = self.client.insert(collection_name=collection_name, records=vectors, ids=ids,
59+
partition_tag=partition_tag)
60+
self.client.flush([collection_name])
61+
print('Insert {} entities, there are {} entities after insert data.'.format(len(ids), self.client.count_entities(collection_name)[1]))
62+
return status, ids
63+
except Exception as e:
64+
print("Milvus insert error:", e)
65+
66+
67+
if __name__ == '__main__':
68+
import random
69+
70+
client = VecToMilvus()
71+
collection_name = 'test1'
72+
partition_tag = 'partition_1'
73+
ids = [random.randint(0, 1000) for _ in range(100)]
74+
embeddings = [[random.random() for _ in range(128)] for _ in range(100)]
75+
status, ids = client.insert(collection_name=collection_name, vectors=embeddings, ids=ids,partition_tag=partition_tag)
76+
print(status)
77+
# print(ids)
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
from milvus import *
2+
3+
# from milvus_tool.config import MILVUS_HOST, MILVUS_PORT, top_k, search_param
4+
from config import MILVUS_HOST, MILVUS_PORT, top_k, search_param
5+
6+
7+
class RecallByMilvus():
8+
def __init__(self):
9+
self.client = Milvus(host=MILVUS_HOST, port=MILVUS_PORT)
10+
11+
def search(self, vectors, collection_name, partition_tag=None):
12+
try:
13+
status, results = self.client.search(collection_name=collection_name, query_records=vectors, top_k=top_k,
14+
params=search_param, partition_tag=partition_tag)
15+
# print(status)
16+
return status, results
17+
except Exception as e:
18+
print('Milvus recall error: ', e)
19+
20+
21+
if __name__ == '__main__':
22+
import random
23+
client = RecallByMilvus()
24+
collection_name = 'test1'
25+
partition_tag = 'partition_3'
26+
embeddings = [[random.random() for _ in range(128)] for _ in range(2)]
27+
status, resultes = client.search(collection_name=collection_name, vectors=embeddings, partition_tag=partition_tag)
28+
print(status)
29+
print(resultes)

tools/recserving/movie_recommender/README.md renamed to recserving/movie_recommender/README.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
## 运行环境
3333
linux 终端
3434
Paddle 版本:1.8+
35-
Python 版本:2.7/3.6
35+
Python 版本:3.6
3636

3737
## 设计方案
3838

@@ -120,17 +120,26 @@ https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md
120120
wget http://download.redis.io/releases/redis-stable.tar.gz --no-check-certificate
121121
tar -xf redis-stable.tar.gz && cd redis-stable/src && make && ./redis-server &
122122
```
123-
目前milvus需要用docker远端启动,在宿主机上启动
123+
目前milvus需要用docker远端启动,在宿主机上启动
124124

125125
```
126-
sudo docker run -d --name milvus_cpu_0.11.0 \
126+
# 下载配置文件
127+
mkdir -p /home/$USER/milvus/conf
128+
cd /home/$USER/milvus/conf
129+
wget https://raw.githubusercontent.com/milvus-io/milvus/v1.0.0/core/conf/demo/server_config.yaml
130+
131+
# 启动 Milvus 服务
132+
sudo docker run -d --name milvus_cpu_1.0.0 \
127133
-p 19530:19530 \
128134
-p 19121:19121 \
129135
-v /home/$USER/milvus/db:/var/lib/milvus/db \
130136
-v /home/$USER/milvus/conf:/var/lib/milvus/conf \
131137
-v /home/$USER/milvus/logs:/var/lib/milvus/logs \
132138
-v /home/$USER/milvus/wal:/var/lib/milvus/wal \
133-
milvusdb/milvus:0.11.0-cpu-d101620-4c44c0
139+
milvusdb/milvus:1.0.0-cpu-d030521-1ea92e
140+
141+
# 安装 Milvus Python SDK
142+
pip install pymilvus==1.0.1
134143
```
135144

136145
3. 运行相关命令
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)