Skip to content

Commit 1c498bc

Browse files
authored
Add Mac system support of pipelines (#2885)
* Add Mac system support of pipelines * Add cpu requirements * Add mac es faq and change to a small dataset of dureader * adjust semantic search example
1 parent 123c4c7 commit 1c498bc

File tree

6 files changed

+49
-11
lines changed

6 files changed

+49
-11
lines changed

applications/experimental/pipelines/examples/semantic-search/README.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ python ./rest_api/setup.py install
5959
python ./ui/setup.py install
6060
```
6161
### 3.2 数据说明
62-
语义检索数据库的数据来自于[DuReader-Robust数据集](https://github.com/baidu/DuReader/tree/master/DuReader-Robust),共包含 46972 个段落文本。
62+
语义检索数据库的数据来自于[DuReader-Robust数据集](https://github.com/baidu/DuReader/tree/master/DuReader-Robust),共包含 46972 个段落文本,并选取了其中验证集1417条段落文本来搭建语义检索系统
6363

6464
### 3.3 一键体验语义检索系统
6565
我们预置了基于[DuReader-Robust数据集](https://github.com/baidu/DuReader/tree/master/DuReader-Robust)搭建语义检索系统的代码示例,您可以通过如下命令快速体验语义检索系统的效果
@@ -78,7 +78,7 @@ python examples/semantic-search/semantic_search_example.py --device cpu
7878
整个 Web 可视化语义检索系统主要包含 3 大组件: 1. 基于 ElasticSearch 的 ANN 服务 2. 基于 RestAPI 构建模型服务 3. 基于 Streamlit 构建 WebUI,接下来我们依次搭建这 3 个服务并最终形成可视化的语义检索系统。
7979

8080
#### 3.4.1 启动 ANN 服务
81-
1. 参考官方文档下载安装 [elasticsearch-8.1.2](https://www.elastic.co/cn/downloads/elasticsearch) 并解压。
81+
1. 参考官方文档下载安装 [elasticsearch-8.3.2](https://www.elastic.co/cn/downloads/elasticsearch) 并解压。
8282
2. 启动 ES 服务
8383
```bash
8484
./bin/elasticsearch
@@ -93,7 +93,7 @@ curl http://localhost:9200/_aliases?pretty=true
9393
```
9494
# 以DuReader-Robust 数据集为例建立 ANN 索引库
9595
python utils/offline_ann.py --index_name dureader_robust_query_encoder \
96-
--doc_dir data/dureader_robust_processed
96+
--doc_dir data/dureader_dev
9797
```
9898
#### 3.4.3 启动 RestAPI 模型服务
9999
```bash
@@ -138,12 +138,19 @@ elasticsearch 需要在非root环境下运行,可以做如下的操作:
138138

139139
```
140140
adduser est
141-
chown est:est -R ${HOME}/elasticsearch-8.1.2/
142-
cd ${HOME}/elasticsearch-8.1.2/
141+
chown est:est -R ${HOME}/elasticsearch-8.3.2/
142+
cd ${HOME}/elasticsearch-8.3.2/
143143
su est
144144
./bin/elasticsearch
145145
```
146146

147+
#### Mac OS上安装elasticsearch出现错误 `flood stage disk watermark [95%] exceeded on.... all indices on this node will be marked read-only`
148+
149+
elasticsearch默认达到95%就全都设置只读,可以腾出一部分空间出来再启动,或者修改 `config/elasticsearch.pyml`
150+
```
151+
cluster.routing.allocation.disk.threshold_enabled: false
152+
```
153+
147154
## Reference
148155
[1]Y. Sun et al., “[ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation](https://arxiv.org/pdf/2107.02137.pdf),” arXiv:2107.02137 [cs], Jul. 2021, Accessed: Jan. 17, 2022. [Online]. Available: http://arxiv.org/abs/2107.02137
149156

applications/experimental/pipelines/examples/semantic-search/semantic_search_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ def semantic_search_tutorial():
3737
)
3838
else:
3939
doc_dir = "data/dureader_robust_processed"
40-
dureader_data = "https://paddlenlp.bj.bcebos.com/applications/dureader_robust_processed.zip"
40+
dureader_data = "https://paddlenlp.bj.bcebos.com/applications/dureader_dev.zip"
4141

4242
fetch_archive_from_http(url=dureader_data, output_dir=doc_dir)
4343
dicts = convert_files_to_dicts(dir_path=doc_dir, split_paragraphs=True)

applications/experimental/pipelines/pipelines/nodes/reader/ernie_dureader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -547,7 +547,7 @@ def logits_to_preds(
547547
start_end_matrix[invalid_indices[0][:], invalid_indices[1][:],
548548
invalid_indices[2][:]] = -999
549549
start_end_matrix = paddle.to_tensor(start_end_matrix,
550-
place=paddle.CUDAPlace(0))
550+
place=self.devices[0])
551551

552552
# Sort the candidate answers by their score. Sorting happens on the flattened matrix.
553553
# flat_sorted_indices.shape: (batch_size, max_seq_len^2, 1)
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
paddlepaddle
2+
paddlenlp
3+
paddleocr
4+
requests
5+
pydantic
6+
mmh3
7+
more_itertools
8+
elasticsearch>=7.7,<=7.10
9+
sqlalchemy>=1.4.2,<2
10+
sqlalchemy_utils
11+
langdetect
12+
python-docx
13+
nltk
14+
pdfplumber
15+
faiss-cpu>=1.7.2
16+
opencv-python
17+
opencv-contrib-python-headless
18+
python-multipart
19+
st-annotated-text
20+
streamlit==1.9.0
21+
fastapi
22+
uvicorn
23+
markdown

applications/experimental/pipelines/setup.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,18 @@
1515
import setuptools
1616
import sys
1717
import pipelines
18+
import platform
1819

1920
long_description = "PIPELINES: An End to End Natural Language Proceessing Development Kit Based on ERNIE"
2021

21-
with open("requirements.txt") as fin:
22-
REQUIRED_PACKAGES = fin.read()
22+
if platform.system().lower() == 'windows':
23+
pass
24+
elif platform.system().lower() == "darwin":
25+
with open("requirements-cpu.txt") as fin:
26+
REQUIRED_PACKAGES = fin.read()
27+
elif platform.system().lower() == 'linux':
28+
with open("requirements.txt") as fin:
29+
REQUIRED_PACKAGES = fin.read()
2330

2431
setuptools.setup(name="pipelines",
2532
version=pipelines.__version__,

applications/experimental/pipelines/ui/webapp_semantic_search.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,9 @@
2828

2929
# Adjust to a question that you would like users to see in the search bar when they load the UI:
3030
DEFAULT_QUESTION_AT_STARTUP = os.getenv("DEFAULT_QUESTION_AT_STARTUP",
31-
"燃气热水器哪个牌子好?")
32-
DEFAULT_ANSWER_AT_STARTUP = os.getenv("DEFAULT_ANSWER_AT_STARTUP", "北京")
31+
"衡量酒水的价格的因素有哪些?")
32+
DEFAULT_ANSWER_AT_STARTUP = os.getenv("DEFAULT_ANSWER_AT_STARTUP",
33+
"酒水的血统,存储的时间等")
3334

3435
# Sliders
3536
DEFAULT_DOCS_FROM_RETRIEVER = int(os.getenv("DEFAULT_DOCS_FROM_RETRIEVER",

0 commit comments

Comments
 (0)