Skip to content

Commit cbd749e

Browse files
committed
ModelCache主代码初始化
1 parent 3dbc530 commit cbd749e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2668
-0
lines changed

.gitignore

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
pip-wheel-metadata/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
*.DS_Store
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
*.db
64+
65+
# Flask stuff:
66+
instance/
67+
.webassets-cache
68+
69+
# Scrapy stuff:
70+
.scrapy
71+
72+
# Sphinx documentation
73+
docs/_build/
74+
75+
# PyBuilder
76+
target/
77+
78+
# Jupyter Notebook
79+
.ipynb_checkpoints
80+
81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
85+
# pyenv
86+
.python-version
87+
88+
# pipenv
89+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
90+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
91+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
92+
# install all needed dependencies.
93+
#Pipfile.lock
94+
95+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
96+
__pypackages__/
97+
98+
# Celery stuff
99+
celerybeat-schedule
100+
celerybeat.pid
101+
102+
# SageMath parsed files
103+
*.sage.py
104+
105+
# Environments
106+
.env
107+
.venv
108+
env/
109+
venv/
110+
ENV/
111+
env.bak/
112+
venv.bak/
113+
114+
# Spyder project settings
115+
.spyderproject
116+
.spyproject
117+
118+
# Rope project settings
119+
.ropeproject
120+
121+
# mkdocs documentation
122+
/site
123+
124+
# mypy
125+
.mypy_cache/
126+
.dmypy.json
127+
dmypy.json
128+
129+
# Pyre type checker
130+
.pyre/
131+
132+
.idea
133+
**/data_map**.txt
134+
**/faiss**.index
135+
**/sqlite**.db
136+
**/**.db
137+
**/example.py
138+
**/example.db
139+
**/.chroma
140+
141+
/fuhui_dev
142+
*.txt
143+
*.index
144+
*model.onnx
145+
146+
/data_analyse
147+
/embedding_npy
148+
/flask_server
149+
*.bin

README.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Codefuse-ModelCache 大模型语义缓存
2+
## Contents
3+
4+
- [新闻](#新闻)
5+
- [项目简介](#项目简介)
6+
- [快速部署](#快速部署)
7+
- [服务访问](#服务访问)
8+
- [文章](#文章)
9+
- [架构大图](#架构大图)
10+
- [核心功能](#核心功能)
11+
## 新闻
12+
[2023.10.31] codefuse-ModelCache...
13+
## 项目简介
14+
Codefuse-ModelCache 是一个开源的大模型语义缓存系统,通过缓存已生成的模型结果,降低类似请求的响应时间,提升用户体验。该项目从服务优化角度出发,引入缓存机制,在资源有限和对实时性要求较高的场景下,帮助企业和研究机构降低推理部署成本、提升模型性能和效率、提供规模化大模型服务。我们希望通过开源,分享交流大模型语义Cache的相关技术。
15+
## 快速部署
16+
### 环境依赖
17+
18+
- python版本: 3.8及以上
19+
- 依赖包安装:
20+
```shell
21+
pip install requirements.txt
22+
```
23+
24+
### 环境配置
25+
在启动服务前,应该进行如下环境配置:
26+
27+
1. 安装关系数据库 mysql, 导入sql创建数据表,sql文件: reference_doc/create_table.sql
28+
2. 安装向量数据库milvus
29+
3. 在配置文件中添加数据库访问信息,配置文件为:
30+
1. modelcache/config/milvus_config.ini
31+
2. modelcache/config/mysql_config.ini
32+
4. 离线模型bin文件下载, 参考地址:[https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main),并将下载的bin文件,放到 model/text2vec-base-chinese 文件夹中
33+
5. 通过flask4modelcache.py脚本启动后端服务。
34+
## 服务访问
35+
当前服务以restful API方式提供3个核心功能:数据写入,cache查询和cache数据清空。请求demo 如下:
36+
### cache写入
37+
```python
38+
import json
39+
import requests
40+
url = 'http://127.0.0.1:5000/modelcache'
41+
type = 'insert'
42+
scope = {"model": "CODEGPT-1008"}
43+
chat_info = [{"query": [{"role": "system", "content": "你是一个AI代码助手, 你必须提供中立的、无害的答案帮助用户解决代码相关的问题"}, {"role": "user", "content": "你是谁?"}],
44+
"answer": "你好,我是智能助手,请问有什么能帮您!"}]
45+
data = {'type': type, 'scope': scope, 'chat_info': chat_info}
46+
headers = {"Content-Type": "application/json"}
47+
res = requests.post(url, headers=headers, json=json.dumps(data))
48+
```
49+
### cache查询
50+
```python
51+
import json
52+
import requests
53+
url = 'http://127.0.0.1:5000/modelcache'
54+
type = 'query'
55+
scope = {"model": "CODEGPT-1008"}
56+
query = [{"role": "system", "content": "你是一个AI代码助手, 你必须提供中立的、无害的答案帮助用户解决代码相关的问题"}, {"role": "user", "content": "你是谁?"}]
57+
data = {'type': type, 'scope': scope, 'query': query}
58+
59+
headers = {"Content-Type": "application/json"}
60+
res = requests.post(url, headers=headers, json=json.dumps(data))
61+
```
62+
### cache清空
63+
```python
64+
import json
65+
import requests
66+
url = 'http://127.0.0.1:5000/modelcache'
67+
type = 'remove'
68+
scope = {"model": "CODEGPT-1008"}
69+
remove_type = 'truncate_by_model'
70+
data = {'type': type, 'scope': scope, 'remove_type': remove_type}
71+
72+
headers = {"Content-Type": "application/json"}
73+
res = requests.post(url, headers=headers, json=json.dumps(data))
74+
```
75+
## 文章
76+
敬请期待
77+
## 架构大图
78+
![image.png](https://intranetproxy.alipay.com/skylark/lark/0/2023/png/275821/1698031968643-35914fc7-bb62-455e-9431-69bca8ba3368.png#clientId=uf441e764-1311-4&from=paste&height=408&id=h5p1L&originHeight=1152&originWidth=1796&originalType=binary&ratio=2&rotation=0&showTitle=false&size=465700&status=done&style=none&taskId=u6f53deb1-7821-47e0-af8a-87d899e3f7a&title=&width=636)
79+
## 核心功能
80+
在ModelCache中,沿用了GPTCache的主要思想,包含了一系列核心模块:adapter、embedding、similarity和data_manager。adapter模块主要功能是处理各种任务的业务逻辑,并且能够将embedding、similarity、data_manager等模块串联起来;embedding模块主要负责将文本转换为语义向量表示,它将用户的查询转换为向量形式,并用于后续的召回或存储操作;rank模块用于对召回的向量进行相似度排序和评估;data_manager模块主要用于管理数据库。同时,为了更好的在工业界落地,我们做了架构和功能上的升级,如下:
81+
82+
- [x] 架构调整(轻量化集成):以类redis的缓存模式嵌入到大模型产品中,提供语义缓存能力,不会干扰LLM调用和安全审核等功能,适配所有大模型服务。
83+
- [x] 多种模型加载方案:
84+
- 支持加载本地embedding模型,解决huggingface网络连通问题
85+
- 支持加载多种预训练模型embeding层
86+
- [x] 数据隔离能力
87+
- 环境隔离:可依据环境,拉取不同的数据库配置,实现环境隔离(开发、预发、生产)
88+
- 多租户数据隔离:根据模型动态创建collection,进行数据隔离,用于大模型产品中多个模型/服务数据隔离问题
89+
- [x] 支持系统指令:采用拼接的方式,解决propmt范式中sys指令问题。
90+
- [x] 长短文本区分:长文本会给相似评估带来更多挑战,增加了长短文本的区分,可单独配置判断阈值。
91+
- [x] milvus性能优化:milvus consistency_level调整为"Session"级别,可以得到更好的性能。
92+
- [x] 数据管理能力:
93+
- 一键清空缓存的能力,用于模型升级后的数据管理。
94+
- 召回hitquery,用于后续的数据分析和模型迭代参考。
95+
- 异步日志回写能力,用于数据分析和统计
96+
- 增加model字段和数据统计字段,用于功能拓展。
97+
98+
未来会持续建设的功能:
99+
100+
- [ ] 基于超参数的数据隔离
101+
- [ ] system promt分区存储能力,以提高相似度匹配的准确度和效率
102+
- [ ] 更通用的embedding模型和相似度评估算法
103+
## 致谢
104+
本项目参考了以下开源项目,在此对相关项目和研究开发人员表示感谢。<br />[GPTCache](https://github.com/zilliztech/GPTCache)
105+

README_EN.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Codefuse-ModelCache LLMs Semantic Cache
2+
## Contents
3+
- [news](#news)
4+
- [Introduction](#Introduction)
5+
- [Quick-Deployment](#Quick-Deployment)
6+
- [Service-Access](#Service-Access)
7+
- [Articles](#Articles)
8+
- [Modules](#Modules)
9+
- [Core-Features](#Core-Features)
10+
- [Acknowledgements](#Acknowledgements)
11+
## news
12+
[2023.08.26] codefuse-ModelCache...
13+
### Introduction
14+
Codefuse-ModelCache is a semantic cache for large language models (LLMs). By caching pre-generated model results, it reduces response time for similar requests and improves user experience. <br />This project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. Through open-source, we aim to share and exchange technologies related to large model semantic cache.
15+
## Quick Deployment
16+
### Dependencies
17+
18+
- Python version: 3.8 and above
19+
- Package Installation
20+
```shell
21+
pip install requirements.txt
22+
```
23+
### Environment Configuration
24+
Before starting the service, the following environment configurations should be performed:
25+
26+
1. Install the relational database MySQL and import the SQL file to create the data tables. The SQL file can be found at: reference_doc/create_table.sql
27+
2. Install the vector database Milvus.
28+
3. Add the database access information to the configuration files:
29+
1. modelcache/config/milvus_config.ini
30+
2. modelcache/config/mysql_config.ini
31+
4. Download the embedding model bin file from the following address: [https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place the downloaded bin file in the model/text2vec-base-chinese folder.
32+
5. Start the backend service using the flask4modelcache.py script.
33+
## Service-Access
34+
The current service provides three core functionalities through RESTful API.: Cache-Writing, Cache-Querying, and Cache-Clearing. Demos:
35+
### Cache-Writing
36+
```python
37+
import json
38+
import requests
39+
url = 'http://127.0.0.1:5000/modelcache'
40+
type = 'insert'
41+
scope = {"model": "CODEGPT-1008"}
42+
chat_info = [{"query": [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "你是谁?"}],
43+
"answer": "Hello, I am an intelligent assistant. How can I assist you?"}]
44+
data = {'type': type, 'scope': scope, 'chat_info': chat_info}
45+
headers = {"Content-Type": "application/json"}
46+
res = requests.post(url, headers=headers, json=json.dumps(data))
47+
```
48+
### Cache-Querying
49+
```python
50+
import json
51+
import requests
52+
url = 'http://127.0.0.1:5000/modelcache'
53+
type = 'query'
54+
scope = {"model": "CODEGPT-1008"}
55+
query = [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "Who are you?"}]
56+
data = {'type': type, 'scope': scope, 'query': query}
57+
58+
headers = {"Content-Type": "application/json"}
59+
res = requests.post(url, headers=headers, json=json.dumps(data))
60+
```
61+
### Cache-Clearing
62+
```python
63+
import json
64+
import requests
65+
url = 'http://127.0.0.1:5000/modelcache'
66+
type = 'remove'
67+
scope = {"model": "CODEGPT-1008"}
68+
remove_type = 'truncate_by_model'
69+
data = {'type': type, 'scope': scope, 'remove_type': remove_type}
70+
71+
headers = {"Content-Type": "application/json"}
72+
res = requests.post(url, headers=headers, json=json.dumps(data))
73+
```
74+
## Articles
75+
Coming soon...
76+
## modules
77+
![image.png](https://intranetproxy.alipay.com/skylark/lark/0/2023/png/275821/1698031968643-35914fc7-bb62-455e-9431-69bca8ba3368.png#clientId=uf441e764-1311-4&from=paste&height=408&id=h5p1L&originHeight=1152&originWidth=1796&originalType=binary&ratio=2&rotation=0&showTitle=false&size=465700&status=done&style=none&taskId=u6f53deb1-7821-47e0-af8a-87d899e3f7a&title=&width=636)
78+
## Core-Features
79+
In ModelCache, we adopted the main idea of GPTCache, includes core modules: adapter, embedding, similarity, and data_manager. The adapter module is responsible for handling the business logic of various tasks and can connect the embedding, similarity, and data_manager modules. The embedding module is mainly responsible for converting text into semantic vector representations, it transforms user queries into vector form.The rank module is used for sorting and evaluating the similarity of the recalled vectors. The data_manager module is primarily used for managing the database. In order to better facilitate industrial applications, we have made architectural and functional upgrades as follows:
80+
81+
- [x] We have modified it similar to Redis and embedded it into the LLMs product, providing semantic caching capabilities. This ensures that it does not interfere with LLM calls, security audits, and other functionalities, achieving compatibility with all large-scale model services.
82+
- [x] Multiple Model Loading Schemes:
83+
- Support loading local embedding models to address Hugging Face network connectivity issues.
84+
- Support loading various pretrained model embedding layers.
85+
- [x] Data Isolation Capability
86+
- Environment Isolation: Can pull different database configurations based on the environment to achieve environment isolation (dev, prepub, prod).
87+
- Multi-tenant Data Isolation: Dynamically create collections based on the model for data isolation, addressing data isolation issues in multi-model/services scenarios in LLMs products.
88+
- [x] Support for System Commands: Adopting a concatenation approach to address the issue of system commands in the prompt format.
89+
- [x] Differentiation of Long and Short Texts: Long texts pose more challenges for similarity evaluation. To address this, we have added differentiation between long and short texts, allowing for separate configuration of threshold values for determining similarity.
90+
- [x] Milvus Performance Optimization: The consistency_level of Milvus has been adjusted to "Session" level, which can result in better performance.
91+
- [x] Data Management Capability:
92+
- Ability to clear the cache, used for data management after model upgrades.
93+
- Hitquery recall for subsequent data analysis and model iteration reference.
94+
- Asynchronous log write-back capability for data analysis and statistics.
95+
- Added model field and data statistics field for feature expansion.
96+
97+
Future Features Under Development:
98+
99+
- [ ] Data isolation based on hyperparameters.
100+
- [ ] System prompt partitioning storage capability to enhance accuracy and efficiency of similarity matching.
101+
- [ ] More versatile embedding models and similarity evaluation algorithms.
102+
## Acknowledgements
103+
This project has referenced the following open-source projects. We would like to express our gratitude to the projects and their developers for their contributions and research.<br />[GPTCache](https://github.com/zilliztech/GPTCache)
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"_name_or_path": "hfl/chinese-macbert-base",
3+
"architectures": [
4+
"BertModel"
5+
],
6+
"attention_probs_dropout_prob": 0.1,
7+
"classifier_dropout": null,
8+
"directionality": "bidi",
9+
"gradient_checkpointing": false,
10+
"hidden_act": "gelu",
11+
"hidden_dropout_prob": 0.1,
12+
"hidden_size": 768,
13+
"initializer_range": 0.02,
14+
"intermediate_size": 3072,
15+
"layer_norm_eps": 1e-12,
16+
"max_position_embeddings": 512,
17+
"model_type": "bert",
18+
"num_attention_heads": 12,
19+
"num_hidden_layers": 12,
20+
"pad_token_id": 0,
21+
"pooler_fc_size": 768,
22+
"pooler_num_attention_heads": 12,
23+
"pooler_num_fc_layers": 3,
24+
"pooler_size_per_head": 128,
25+
"pooler_type": "first_token_transform",
26+
"position_embedding_type": "absolute",
27+
"torch_dtype": "float32",
28+
"transformers_version": "4.12.3",
29+
"type_vocab_size": 2,
30+
"use_cache": true,
31+
"vocab_size": 21128
32+
}

0 commit comments

Comments
 (0)