Skip to content

Commit ae5be17

Browse files
docs: update README
1 parent 6eaad17 commit ae5be17

File tree

2 files changed

+74
-22
lines changed

2 files changed

+74
-22
lines changed

README.md

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthe
2828

2929
- 📝 [What is GraphGen?](#-what-is-graphgen)
3030
- 📌 [Latest Updates](#-latest-updates)
31+
- ⚙️ [Support List](#-support-list)
3132
- 🚀 [Quick Start](#-quick-start)
3233
- 🏗️ [System Architecture](#-system-architecture)
3334
- 🍀 [Acknowledgements](#-acknowledgements)
@@ -47,13 +48,13 @@ GraphGen is a framework for synthetic data generation guided by knowledge graphs
4748

4849
Here is post-training result which **over 50% SFT data** comes from GraphGen and our data clean pipeline.
4950

50-
| Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) |
51-
| :-: | :-: | :-: | :-: |
52-
| Plant| [SeedBench](https://github.com/open-sciencelab/SeedBench) | **65.9** | 51.5 |
53-
| Common | CMMLU | 73.6 | **75.8** |
54-
| Knowledge | GPQA-Diamond | **40.0** | 33.3 |
55-
| Math | AIME24 | **20.6** | 16.7 |
56-
| | AIME25 | **22.7** | 7.2 |
51+
| Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) |
52+
|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
53+
| Plant | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **65.9** | 51.5 |
54+
| Common | CMMLU | 73.6 | **75.8** |
55+
| Knowledge | GPQA-Diamond | **40.0** | 33.3 |
56+
| Math | AIME24 | **20.6** | 16.7 |
57+
| | AIME25 | **22.7** | 7.2 |
5758

5859
It begins by constructing a fine-grained knowledge graph from the source text,then identifies knowledge gaps in LLMs using the expected calibration error metric, prioritizing the generation of QA pairs that target high-value, long-tail knowledge.
5960
Furthermore, GraphGen incorporates multi-hop neighborhood sampling to capture complex relational information and employs style-controlled generation to diversify the resulting QA data.
@@ -77,6 +78,32 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL
7778
</details>
7879

7980

81+
## ⚙️ Support List
82+
83+
We support various LLM inference servers, API servers, inference clients, input file formats, data modalities, output data formats, and output data types:
84+
85+
| Inference Server | Api Server | Inference Client | Input File Format | Data Modal | Output Data Format | Output Data Type |
86+
|------------------------------------------------|---------------------------------------------------------------------------------------|----------------------------------------------------------------------|------------------------------------|--------------------|------------------------------|-------------------------------------------------|
87+
| [![hf-icon]][hf] HF<br>[![sg-icon]][sg] SGLang | [![sif-icon]][sif] SiliconFlow<br>[![oai-icon]][oai] OpenAI<br>[![az-icon]][az] Azure | Generic HTTP<br>[![ol-icon]][ol] Ollama<br>[![oai-icon]][oai] OpenAI | CSV<br>JSON<br>JSONL<br>PDF<br>TXT | TEXT<br>TEXT+IMAGE | Alpaca<br>ChatML<br>Sharegpt | Aggregated<br>Atomic<br>CoT<br>Multi-hop<br>VQA |
88+
89+
<!-- links -->
90+
[hf]: https://huggingface.co/docs/transformers/index
91+
[sg]: https://docs.sglang.ai
92+
[sif]: https://siliconflow.cn
93+
[oai]: https://openai.com
94+
[az]: https://azure.microsoft.com/en-us/services/cognitive-services/openai-service/
95+
[ol]: https://ollama.com
96+
97+
<!-- icons -->
98+
[hf-icon]: https://www.google.com/s2/favicons?domain=https://huggingface.co
99+
[sg-icon]: https://www.google.com/s2/favicons?domain=https://docs.sglang.ai
100+
[sif-icon]: https://www.google.com/s2/favicons?domain=siliconflow.com
101+
[oai-icon]: https://www.google.com/s2/favicons?domain=https://openai.com
102+
[az-icon]: https://www.google.com/s2/favicons?domain=https://azure.microsoft.com
103+
[ol-icon]: https://www.google.com/s2/favicons?domain=https://ollama.com
104+
105+
106+
80107
## 🚀 Quick Start
81108

82109
Experience GraphGen through [Web](https://g-app-center-120612-6433-jpdvmvp.openxlab.space) or [Backup Web Entrance](https://openxlab.org.cn/apps/detail/chenzihonga/GraphGen)
@@ -177,7 +204,7 @@ For any questions, please check [FAQ](https://github.com/open-sciencelab/GraphGe
177204
Pick the desired format and run the matching script:
178205

179206
| Format | Script to run | Notes |
180-
| ------------ | ---------------------------------------------- |-------------------------------------------------------------------|
207+
|--------------|------------------------------------------------|-------------------------------------------------------------------|
181208
| `cot` | `bash scripts/generate/generate_cot.sh` | Chain-of-Thought Q\&A pairs |
182209
| `atomic` | `bash scripts/generate/generate_atomic.sh` | Atomic Q\&A pairs covering basic knowledge |
183210
| `aggregated` | `bash scripts/generate/generate_aggregated.sh` | Aggregated Q\&A pairs incorporating complex, integrated knowledge |

README_zh.md

Lines changed: 39 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,14 @@ GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthe
2626
<summary><b>📚 目录</b></summary>
2727

2828
- 📝 [什么是 GraphGen?](#-什么是-graphgen)
29-
- 📌 [最新更新](#最新更新)
30-
- 🚀 [快速开始](#快速开始)
31-
- 🏗️ [系统架构](#系统架构)
32-
- 🍀 [致谢](#致谢)
33-
- 📚 [引用](#引用)
34-
- 📜 [许可证](#许可证)
35-
- 📅 [星标历史](#星标历史)
29+
- 📌 [最新更新](#-最新更新)
30+
- ⚙️ [支持列表](#-支持列表)
31+
- 🚀 [快速开始](#-快速开始)
32+
- 🏗️ [系统架构](#-系统架构)
33+
- 🍀 [致谢](#-致谢)
34+
- 📚 [引用](#-引用)
35+
- 📜 [许可证](#-许可证)
36+
- 📅 [星标历史](#-星标历史)
3637

3738

3839
[//]: # (- 🌟 [主要特性](#主要特性))
@@ -48,13 +49,13 @@ GraphGen 是一个基于知识图谱的数据合成框架。请查看[**论文**
4849

4950
以下是在超过 50 % 的 SFT 数据来自 GraphGen 及我们的数据清洗流程时的训练后结果:
5051

51-
| 领域 | 数据集 | 我们的方案 | Qwen2.5-7B-Instruct(基线) |
52-
| :-: | :-: | :-: | :-: |
53-
| 植物 | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **65.9** | 51.5 |
54-
| 常识 | CMMLU | 73.6 | **75.8** |
55-
| 知识 | GPQA-Diamond | **40.0** | 33.3 |
56-
| 数学 | AIME24 | **20.6** | 16.7 |
57-
| | AIME25 | **22.7** | 7.2 |
52+
| 领域 | 数据集 | 我们的方案 | Qwen2.5-7B-Instruct(基线) |
53+
|:--:|:---------------------------------------------------------:|:--------:|:-----------------------:|
54+
| 植物 | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **65.9** | 51.5 |
55+
| 常识 | CMMLU | 73.6 | **75.8** |
56+
| 知识 | GPQA-Diamond | **40.0** | 33.3 |
57+
| 数学 | AIME24 | **20.6** | 16.7 |
58+
| | AIME25 | **22.7** | 7.2 |
5859

5960
GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期望校准误差指标识别大语言模型中的知识缺口,优先生成针对高价值长尾知识的问答对。
6061
此外,GraphGen 采用多跳邻域采样捕获复杂关系信息,并使用风格控制生成来丰富问答数据的多样性。
@@ -76,6 +77,30 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
7677

7778
</details>
7879

80+
## ⚙️ 支持列表
81+
82+
我们支持多种 LLM 推理服务器、API 服务器、推理客户端、输入文件格式、数据模态、输出数据格式和输出数据类型:
83+
84+
| 推理服务器 | API 服务器 | 推理客户端 | 输入文件格式 | 数据模态 | 输出数据格式 | 输出数据类型 |
85+
|------------------------------------------------|---------------------------------------------------------------------------------------|----------------------------------------------------------------------|------------------------------------|--------------------|------------------------------|-------------------------------------------------|
86+
| [![hf-icon]][hf] HF<br>[![sg-icon]][sg] SGLang | [![sif-icon]][sif] SiliconFlow<br>[![oai-icon]][oai] OpenAI<br>[![az-icon]][az] Azure | Generic HTTP<br>[![ol-icon]][ol] Ollama<br>[![oai-icon]][oai] OpenAI | CSV<br>JSON<br>JSONL<br>PDF<br>TXT | TEXT<br>TEXT+IMAGE | Alpaca<br>ChatML<br>Sharegpt | Aggregated<br>Atomic<br>CoT<br>Multi-hop<br>VQA |
87+
88+
<!-- links -->
89+
[hf]: https://huggingface.co/docs/transformers/index
90+
[sg]: https://docs.sglang.ai
91+
[sif]: https://siliconflow.cn
92+
[oai]: https://openai.com
93+
[az]: https://azure.microsoft.com/en-us/services/cognitive-services/openai-service/
94+
[ol]: https://ollama.com
95+
96+
<!-- icons -->
97+
[hf-icon]: https://www.google.com/s2/favicons?domain=https://huggingface.co
98+
[sg-icon]: https://www.google.com/s2/favicons?domain=https://docs.sglang.ai
99+
[sif-icon]: https://www.google.com/s2/favicons?domain=siliconflow.com
100+
[oai-icon]: https://www.google.com/s2/favicons?domain=https://openai.com
101+
[az-icon]: https://www.google.com/s2/favicons?domain=https://azure.microsoft.com
102+
[ol-icon]: https://www.google.com/s2/favicons?domain=https://ollama.com
103+
79104

80105
## 🚀 快速开始
81106

0 commit comments

Comments
 (0)