Skip to content

Commit 2aa8d22

Browse files
joshuayaopre-commit-ci[bot]
authored andcommitted
Create text2query microservice for text2sql and text2cypher (opea-project#1931)
* Create text2cyper microservice for text2sql and text2cypher. Signed-off-by: Yi Yao <yi.a.yao@intel.com> --------- Signed-off-by: Yi Yao <yi.a.yao@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: sunzhonghua2004 <137033036@qq.com>
1 parent 3ba2456 commit 2aa8d22

31 files changed

+27287
-0
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
/comps/text2graph/ sharath.raghava@intel.com letong.han@intel.com
2626
/comps/text2image/ xinyu.ye@intel.com liang1.lv@intel.com
2727
/comps/text2kg/ siddhi.velankar@intel.com letong.han@intel.com
28+
/comps/text2query/ yogesh.pandey@intel.com jean1.yu@intel.com yi.a.yao@intel.com
2829
/comps/text2sql/ yogesh.pandey@intel.com qing.yao@intel.com
2930
/comps/third_parties/ liang1.lv@intel.com letong.han@intel.com
3031
/comps/tts/ sihan.chen@intel.com letong.han@intel.com
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# this file should be run in the root of the repo
5+
services:
6+
text2query-sql:
7+
build:
8+
dockerfile: comps/text2query/src/Dockerfile
9+
image: ${REGISTRY:-opea}/text2query-sql:${TAG:-latest}
10+
11+
text2query-cypher:
12+
build:
13+
dockerfile: comps/text2query/src/Dockerfile.cypher.intel_hpu
14+
image: ${REGISTRY:-opea}/text2query-cypher:${TAG:-latest}

comps/cores/mega/constants.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ class ServiceType(Enum):
4141
LANGUAGE_DETECTION = 24
4242
PROMPT_TEMPLATE = 25
4343
PROMPT_REGISTRY = 26
44+
TEXT2QUERY = 27
4445

4546

4647
class MegaServiceEndpoint(Enum):

comps/cores/proto/api_protocol.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1057,3 +1057,13 @@ class FineTuningJobCheckpoint(BaseModel):
10571057

10581058
class RouteEndpointDoc(BaseModel):
10591059
url: str = Field(..., description="URL of the chosen inference endpoint")
1060+
1061+
1062+
class Text2QueryRequest(BaseModel):
1063+
query: Optional[str] = None
1064+
conn_type: Optional[str] = "sql"
1065+
conn_url: Optional[str] = None
1066+
conn_user: Optional[str] = None
1067+
conn_password: Optional[str] = None
1068+
conn_dialect: Optional[str] = "postgresql"
1069+
options: Dict = {}

comps/text2cypher/src/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# ⚠️ Deprecation Notice: `text2cypher`
2+
3+
**This repository is no longer actively maintained.**
4+
5+
As of OPEA v1.5, we are deprecating the `text2cypher` microservice. Please use `text2query` microservice instead. We will remove `text2cypher` at OPEA v1.7.
6+
17
# 🛢 Text-to-Cypher Microservice
28

39
The microservice enables a wide range of use cases, making it a versatile tool for businesses, researchers, and individuals alike. Users can generate queries based on natural language questions, enabling them to quickly retrieve relevant data from graph databases. This service executes locally on Intel Gaudi.

comps/text2query/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# 🛢 Text-to-Query Microservice
2+
3+
A text2query microservice is a specialized, independent service designed to translate natural language queries into structured query languages. Its primary function is to act as an interpreter, allowing users to ask questions in plain human language and receive a formal query in return, which can then be executed against a Relational or Graph database. This service bridges the gap between human communication and machine-readable database commands.
4+
5+
## 🛠️ Features
6+
7+
- **Implement SQL Query based on input text**: Transform user-provided natural language into SQL queries, subsequently executing them to retrieve data from SQL databases.
8+
- **Implement Cypher Query based on input text**: Transform user-provided natural language into Cypher queries, subsequently executing them to retrieve data from Neo4j Graph database.
9+
10+
## ⚙️ Supported Implementations
11+
12+
The Text2Query Microservice supports multiple implementation options to suit different databases. Each implementation includes its own configuration and setup instructions:
13+
14+
| Implementation | Description | Supported Hardware | Documentation |
15+
| ------------------ | --------------------------------------------------------------- | ------------------ | ------------------------------ |
16+
| **Text-to-SQL** | Transforming user-provided natural language into SQL queries | Xeon, Gaudi | [README](src/README_sql.md) |
17+
| **Text-to-Cypher** | Transforming user-provided natural language into Cypher queries | Gaudi | [README](src/README_cypher.md) |
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
include:
5+
- ../../../third_parties/tgi/deployment/docker_compose/compose.yaml
6+
- ../../../third_parties/neo4j/deployment/docker_compose/compose.yaml
7+
8+
services:
9+
postgres:
10+
image: postgres:latest
11+
container_name: postgres-container
12+
restart: always
13+
environment:
14+
- POSTGRES_USER=${POSTGRES_USER-postgres}
15+
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD-testpwd}
16+
- POSTGRES_DB=${POSTGRES_DB-chinook}
17+
ports:
18+
- '5442:5432'
19+
volumes:
20+
- ../../src/integrations/sql/chinook.sql:/docker-entrypoint-initdb.d/chinook.sql
21+
22+
text2query-sql:
23+
image: opea/text2query-sql:${TAG:-latest}
24+
container_name: text2query-sql-server
25+
ports:
26+
- ${TEXT2SQL_PORT:-9097}:9097
27+
environment:
28+
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT-http://localhost:8008}
29+
TEXT2QUERY_COMPONENT_NAME: "OPEA_TEXT2QUERY_SQL"
30+
depends_on:
31+
- tgi-server
32+
- postgres
33+
34+
text2query-sql-gaudi:
35+
image: opea/text2query-sql:${TAG:-latest}
36+
container_name: text2query-sql-gaudi-server
37+
ports:
38+
- ${TEXT2SQL_PORT:-9097}:9097
39+
environment:
40+
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT-http://localhost:8008}
41+
TEXT2QUERY_COMPONENT_NAME: "OPEA_TEXT2QUERY_SQL"
42+
depends_on:
43+
- tgi-gaudi-server
44+
- postgres
45+
46+
text2query-cypher-gaudi:
47+
image: opea/text2query-cypher:${TAG:-latest}
48+
container_name: text2query-cypher-gaudi-server
49+
ports:
50+
- ${TEXT2CYPHER_PORT:-9097}:9097
51+
depends_on:
52+
neo4j-apoc:
53+
condition: service_healthy
54+
ipc: host
55+
environment:
56+
no_proxy: ${no_proxy}
57+
http_proxy: ${http_proxy}
58+
https_proxy: ${https_proxy}
59+
INDEX_NAME: ${INDEX_NAME}
60+
HUGGINGFACEHUB_API_TOKEN: ${HF_TOKEN}
61+
HF_TOKEN: ${HF_TOKEN}
62+
LOGFLAG: ${LOGFLAG:-False}
63+
HABANA_VISIBLE_DEVICES: all
64+
OMPI_MCA_btl_vader_single_copy_mechanism: none
65+
TOKENIZERS_PARALLELISM: False
66+
NEO4J_URI: ${NEO4J_URI}
67+
NEO4J_URL: ${NEO4J_URI}
68+
NEO4J_USERNAME: ${NEO4J_USERNAME}
69+
NEO4J_PASSWORD: ${NEO4J_PASSWORD}
70+
host_ip: ${host_ip}
71+
TEXT2QUERY_COMPONENT_NAME: "OPEA_TEXT2QUERY_CYPHER"
72+
runtime: habana
73+
cap_add:
74+
- SYS_NICE
75+
restart: unless-stopped
76+
77+
text2query-graph:
78+
image: opea/text2query-graph:${TAG:-latest}
79+
container_name: text2query-graph-server
80+
ports:
81+
- ${TEXT2GRAPH_PORT:-9097}:9097
82+
environment:
83+
- no_proxy=${no_proxy}
84+
- https_proxy=${https_proxy}
85+
- http_proxy=${http_proxy}
86+
- LLM_MODEL_ID=${LLM_MODEL_ID:-"Babelscape/rebel-large"}
87+
- HF_TOKEN=${HF_TOKEN}
88+
- TEXT2QUERY_COMPONENT_NAME=OPEA_TEXT2QUERY_GRAPH
89+
ipc: host
90+
restart: always
91+
92+
networks:
93+
default:
94+
driver: bridge

comps/text2query/deployment/kubernetes/README.md

Whitespace-only changes.

comps/text2query/src/Dockerfile

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
FROM python:3.11-slim
5+
6+
ENV LANG=C.UTF-8
7+
ARG ARCH=cpu
8+
9+
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
10+
build-essential \
11+
libjemalloc-dev
12+
13+
RUN useradd -m -s /bin/bash user && \
14+
mkdir -p /home/user && \
15+
chown -R user /home/user/
16+
17+
COPY comps /home/user/comps
18+
19+
ARG uvpip='uv pip install --system --no-cache-dir'
20+
RUN pip install --no-cache-dir --upgrade pip setuptools uv && \
21+
if [ ${ARCH} = "cpu" ]; then \
22+
$uvpip torch --index-url https://download.pytorch.org/whl/cpu; \
23+
$uvpip -r /home/user/comps/text2query/src/requirements-cpu.txt; \
24+
else \
25+
$uvpip -r /home/user/comps/text2query/src/requirements-gpu.txt; \
26+
fi
27+
28+
ENV PYTHONPATH=$PYTHONPATH:/home/user
29+
30+
USER user
31+
32+
WORKDIR /home/user/comps/text2query/src/
33+
34+
ENTRYPOINT ["python", "opea_text2query_microservice.py"]
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# HABANA environment
5+
FROM vault.habana.ai/gaudi-docker/1.20.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0 AS hpu
6+
7+
ENV LANG=en_US.UTF-8
8+
ARG REPO=https://github.com/huggingface/optimum-habana.git
9+
ARG REPO_VER=v1.15.0
10+
11+
RUN apt-get update && apt-get install -y --no-install-recommends --fix-missing \
12+
git-lfs \
13+
libjemalloc-dev
14+
15+
#RUN useradd -m -s /bin/bash user && \
16+
# mkdir -p /home/user && \
17+
# chown -R user /home/user/
18+
19+
RUN git lfs install
20+
21+
COPY comps /root/comps
22+
#RUN chown -R user /home/user/comps/text2cypher
23+
24+
#RUN rm -rf /etc/ssh/ssh_host*
25+
26+
ARG uvpip='uv pip install --system --no-cache-dir'
27+
RUN pip install --no-cache-dir --upgrade pip setuptools uv && \
28+
pip install --no-cache-dir accelerate \
29+
huggingface_hub \
30+
json_repair \
31+
langchain_experimental \
32+
llama-index \
33+
llama-index-embeddings-huggingface \
34+
llama-index-embeddings-langchain \
35+
llama-index-graph-stores-neo4j \
36+
llama-index-llms-huggingface \
37+
llama-index-llms-huggingface-api \
38+
neo4j \
39+
peft \
40+
pydub \
41+
pyprojroot \
42+
sentence-transformers \
43+
unstructured \
44+
urllib3 \
45+
optimum-habana==1.17.0 && \
46+
$uvpip git+https://github.com/HabanaAI/DeepSpeed.git@1.19.0
47+
48+
RUN git clone --depth 1 --branch ${REPO_VER} ${REPO}
49+
50+
WORKDIR /root/comps/text2query/src
51+
RUN $uvpip -r requirements-cpu.txt && \
52+
$uvpip --upgrade --force-reinstall pydantic numpy==1.26.3 transformers==4.49.0
53+
54+
# Set environment variables
55+
ENV PYTHONPATH=/root:/usr/lib/habanalabs/:/root/optimum-habana
56+
ENV HABANA_VISIBLE_DEVICES=all
57+
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
58+
ENV DEBIAN_FRONTEND="noninteractive" TZ=Etc/UTC
59+
60+
#USER user
61+
WORKDIR /root/comps/text2query/src
62+
63+
ENTRYPOINT ["python", "opea_text2query_microservice.py"]
64+

0 commit comments

Comments
 (0)