Skip to content

Commit e951655

Browse files
docs: update README
1 parent ac22fdc commit e951655

File tree

2 files changed

+39
-9
lines changed

2 files changed

+39
-9
lines changed

README.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,14 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL
6262

6363
## 📌 Latest Updates
6464

65+
- **2025.12.1**: Added search support for [NCBI](https://www.ncbi.nlm.nih.gov/) and [RNAcentral](https://rnacentral.org/) databases, enabling extraction of DNA and RNA data from these bioinformatics databases.
6566
- **2025.10.30**: We support several new LLM clients and inference backends including [Ollama_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/ollama_client.py), [http_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/http_client.py), [HuggingFace Transformers](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/hf_wrapper.py) and [SGLang](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/sglang_wrapper.py).
6667
- **2025.10.23**: We support VQA(Visual Question Answering) data generation now. Run script: `bash scripts/generate/generate_vqa.sh`.
67-
- **2025.10.21**: We support PDF as input format for data generation now via [MinerU](https://github.com/opendatalab/MinerU).
6868

6969
<details>
7070
<summary>History</summary>
7171

72+
- **2025.10.21**: We support PDF as input format for data generation now via [MinerU](https://github.com/opendatalab/MinerU).
7273
- **2025.09.29**: We auto-update gradio demo on [Hugging Face](https://huggingface.co/spaces/chenzihong/GraphGen) and [ModelScope](https://modelscope.cn/studios/chenzihong/GraphGen).
7374
- **2025.08.14**: We have added support for community detection in knowledge graphs using the Leiden algorithm, enabling the synthesis of Chain-of-Thought (CoT) data.
7475
- **2025.07.31**: We have added Google, Bing, Wikipedia, and UniProt as search back-ends.
@@ -82,9 +83,10 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL
8283
We support various LLM inference servers, API servers, inference clients, input file formats, data modalities, output data formats, and output data types.
8384
Users can flexibly configure according to the needs of synthetic data.
8485

85-
| Inference Server | Api Server | Inference Client | Input File Format | Data Modal | Data Format | Data Type |
86-
|----------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------|------------------------------------|---------------|------------------------------|-------------------------------------------------|
87-
| [![hf-icon]HF][hf]<br>[![sg-icon]SGLang][sg] | [![sif-icon]Silicon][sif]<br>[![oai-icon]OpenAI][oai]<br>[![az-icon]Azure][az] | HTTP<br>[![ol-icon]Ollama][ol]<br>[![oai-icon]OpenAI][oai] | CSV<br>JSON<br>JSONL<br>PDF<br>TXT | TEXT<br>IMAGE | Alpaca<br>ChatML<br>Sharegpt | Aggregated<br>Atomic<br>CoT<br>Multi-hop<br>VQA |
86+
87+
| Inference Server | Api Server | Inference Client | Data Source | Data Modal | Data Type |
88+
|----------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|-------------------------------------------------|
89+
| [![hf-icon]HF][hf]<br>[![sg-icon]SGLang][sg] | [![sif-icon]Silicon][sif]<br>[![oai-icon]OpenAI][oai]<br>[![az-icon]Azure][az] | HTTP<br>[![ol-icon]Ollama][ol]<br>[![oai-icon]OpenAI][oai] | Files(CSV, JSON, PDF, TXT, etc.)<br>Databases([![uniprot-icon]UniProt][uniprot], [![ncbi-icon]NCBI][ncbi], [![rnacentral-icon]RNAcentral][rnacentral])<br>Search Engines([![bing-icon]Bing][bing], [![google-icon]Google][google])<br>Knowledge Graphs([![wiki-icon]Wikipedia][wiki]) | TEXT<br>IMAGE | Aggregated<br>Atomic<br>CoT<br>Multi-hop<br>VQA |
8890

8991
<!-- links -->
9092
[hf]: https://huggingface.co/docs/transformers/index
@@ -93,6 +95,13 @@ Users can flexibly configure according to the needs of synthetic data.
9395
[oai]: https://openai.com
9496
[az]: https://azure.microsoft.com/en-us/services/cognitive-services/openai-service/
9597
[ol]: https://ollama.com
98+
[uniprot]: https://www.uniprot.org/
99+
[ncbi]: https://www.ncbi.nlm.nih.gov/
100+
[rnacentral]: https://rnacentral.org/
101+
[wiki]: https://www.wikipedia.org/
102+
[bing]: https://www.bing.com/
103+
[google]: https://www.google.com
104+
96105

97106
<!-- icons -->
98107
[hf-icon]: https://www.google.com/s2/favicons?domain=https://huggingface.co
@@ -102,6 +111,12 @@ Users can flexibly configure according to the needs of synthetic data.
102111
[az-icon]: https://www.google.com/s2/favicons?domain=https://azure.microsoft.com
103112
[ol-icon]: https://www.google.com/s2/favicons?domain=https://ollama.com
104113

114+
[uniprot-icon]: https://www.google.com/s2/favicons?domain=https://www.uniprot.org
115+
[ncbi-icon]: https://www.google.com/s2/favicons?domain=https://www.ncbi.nlm.nih.gov/
116+
[rnacentral]: https://www.google.com/s2/favicons?domain=https://rnacentral.org/
117+
[wiki-icon]: https://www.google.com/s2/favicons?domain=https://www.wikipedia.org/
118+
[bing-icon]: https://www.google.com/s2/favicons?domain=https://www.bing.com/
119+
[google-icon]: https://www.google.com/s2/favicons?domain=https://www.google.com
105120

106121

107122
## 🚀 Quick Start

README_zh.md

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,14 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
6262
在数据生成后,您可以使用[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)[xtuner](https://github.com/InternLM/xtuner)对大语言模型进行微调。
6363

6464
## 📌 最新更新
65-
- **2025.10.30** 我们支持多种新的 LLM 客户端和推理后端,包括 [Ollama_client]([Ollama_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/ollama_client.py), [http_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/http_client.py), [HuggingFace Transformers](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/hf_wrapper.py)[SGLang](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/sglang_wrapper.py).
65+
- **2025.12.1**:新增对 [NCBI](https://www.ncbi.nlm.nih.gov/)[RNAcentral](https://rnacentral.org/) 数据库的检索支持,现在可以从这些生物信息学数据库中提取DNA和RNA数据。
66+
- **2025.10.30**:我们支持多种新的 LLM 客户端和推理后端,包括 [Ollama_client]([Ollama_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/ollama_client.py), [http_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/http_client.py), [HuggingFace Transformers](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/hf_wrapper.py)[SGLang](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/sglang_wrapper.py)
6667
- **2025.10.23**:我们现在支持视觉问答(VQA)数据生成。运行脚本:`bash scripts/generate/generate_vqa.sh`
67-
- **2025.10.21**:我们现在通过 [MinerU](https://github.com/opendatalab/MinerU) 支持 PDF 作为数据生成的输入格式。
6868

6969
<details>
7070
<summary>历史更新</summary>
7171

72+
- **2025.10.21**:我们现在通过 [MinerU](https://github.com/opendatalab/MinerU) 支持 PDF 作为数据生成的输入格式。
7273
- **2025.09.29**:我们在 [Hugging Face](https://huggingface.co/spaces/chenzihong/GraphGen)[ModelScope](https://modelscope.cn/studios/chenzihong/GraphGen) 上自动更新 Gradio 应用。
7374
- **2025.08.14**:支持利用 Leiden 社区发现算法对知识图谱进行社区划分,合成 CoT 数据。
7475
- **2025.07.31**:新增 Google、Bing、Wikipedia 和 UniProt 作为搜索后端,帮助填补数据缺口。
@@ -81,9 +82,9 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
8182
我们支持多种 LLM 推理服务器、API 服务器、推理客户端、输入文件格式、数据模态、输出数据格式和输出数据类型。
8283
可以根据合成数据的需求进行灵活配置。
8384

84-
| 推理服务器 | API 服务器 | 推理客户端 | 输入文件格式 | 数据模态 | 输出数据格式 | 输出数据类型 |
85-
|----------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------|------------------------------------|--------------|------------------------------|-------------------------------------------------|
86-
| [![hf-icon]HF][hf]<br>[![sg-icon]SGLang][sg] | [![sif-icon]Silicon][sif]<br>[![oai-icon]OpenAI][oai]<br>[![az-icon]Azure][az] | HTTP<br>[![ol-icon]Ollama][ol]<br>[![oai-icon]OpenAI][oai] | CSV<br>JSON<br>JSONL<br>PDF<br>TXT | TEXT<br>TEXT | Alpaca<br>ChatML<br>Sharegpt | Aggregated<br>Atomic<br>CoT<br>Multi-hop<br>VQA |
85+
| 推理服务器 | API 服务器 | 推理客户端 | 输入文件格式 | 数据模态 | 输出数据类型 |
86+
|----------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|-------------------------------------------------|
87+
| [![hf-icon]HF][hf]<br>[![sg-icon]SGLang][sg] | [![sif-icon]Silicon][sif]<br>[![oai-icon]OpenAI][oai]<br>[![az-icon]Azure][az] | HTTP<br>[![ol-icon]Ollama][ol]<br>[![oai-icon]OpenAI][oai] | 文件(CSV, JSON, JSONL, PDF, TXT等)<br>数据库([![uniprot-icon]UniProt][uniprot], [![ncbi-icon]NCBI][ncbi], [![rnacentral-icon]RNAcentral][rnacentral])<br>搜索引擎([![bing-icon]Bing][bing], [![google-icon]Google][google])<br>知识图谱([![wiki-icon]Wikipedia][wiki]) | TEXT<br>IMAGE | Aggregated<br>Atomic<br>CoT<br>Multi-hop<br>VQA |
8788

8889
<!-- links -->
8990
[hf]: https://huggingface.co/docs/transformers/index
@@ -92,6 +93,13 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
9293
[oai]: https://openai.com
9394
[az]: https://azure.microsoft.com/en-us/services/cognitive-services/openai-service/
9495
[ol]: https://ollama.com
96+
[uniprot]: https://www.uniprot.org/
97+
[ncbi]: https://www.ncbi.nlm.nih.gov/
98+
[rnacentral]: https://rnacentral.org/
99+
[wiki]: https://www.wikipedia.org/
100+
[bing]: https://www.bing.com/
101+
[google]: https://www.google.com
102+
95103

96104
<!-- icons -->
97105
[hf-icon]: https://www.google.com/s2/favicons?domain=https://huggingface.co
@@ -101,6 +109,13 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
101109
[az-icon]: https://www.google.com/s2/favicons?domain=https://azure.microsoft.com
102110
[ol-icon]: https://www.google.com/s2/favicons?domain=https://ollama.com
103111

112+
[uniprot-icon]: https://www.google.com/s2/favicons?domain=https://www.uniprot.org
113+
[ncbi-icon]: https://www.google.com/s2/favicons?domain=https://www.ncbi.nlm.nih.gov/
114+
[rnacentral]: https://www.google.com/s2/favicons?domain=https://rnacentral.org/
115+
[wiki-icon]: https://www.google.com/s2/favicons?domain=https://www.wikipedia.org/
116+
[bing-icon]: https://www.google.com/s2/favicons?domain=https://www.bing.com/
117+
[google-icon]: https://www.google.com/s2/favicons?domain=https://www.google.com
118+
104119

105120
## 🚀 快速开始
106121

0 commit comments

Comments
 (0)