Skip to content

Commit 128d2f8

Browse files
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen into kg_builder
2 parents 9b7ef17 + 4ea9ba9 commit 128d2f8

File tree

16 files changed

+320
-76
lines changed

16 files changed

+320
-76
lines changed

.github/sync-config.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,5 @@ sync:
1313
dest: app.py
1414
- source: requirements.txt
1515
dest: requirements.txt
16-
- source: README_HF.md
17-
dest: README.md
1816
- source: LICENSE
1917
dest: LICENSE

.github/workflows/push-to-hf.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ jobs:
4343
[[ -d hf-repo ]] && rm -rf hf-repo
4444
git clone https://huggingface.co/${HF_REPO_TYPE}/${HF_REPO_ID} hf-repo
4545
46-
rsync -a --delete --exclude='.git' --exclude='hf-repo' ./ hf-repo/
46+
rsync -a --delete --exclude='.git' --exclude='hf-repo' --exclude='README.md' ./ hf-repo/
4747
4848
cd hf-repo
4949
git add .

.github/workflows/push-to-ms.yml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
name: Push demo branch to ModelScope
2+
3+
on:
4+
workflow_call:
5+
inputs:
6+
ref:
7+
required: false
8+
default: demo
9+
type: string
10+
secrets:
11+
MS_TOKEN:
12+
required: true
13+
14+
jobs:
15+
push-ms:
16+
runs-on: ubuntu-latest
17+
steps:
18+
- name: Checkout
19+
uses: actions/checkout@v4
20+
with:
21+
ref: ${{ inputs.ref }}
22+
token: ${{ secrets.GITHUB_TOKEN }}
23+
24+
- name: Configure Git identity
25+
run: |
26+
git config --global user.email "[email protected]"
27+
git config --global user.name "github-actions[bot]"
28+
29+
- name: Install dependencies
30+
run: |
31+
python -m pip install --upgrade pip
32+
# ModelScope official SDK (optional, install only if you need to call the platform API)
33+
pip install modelscope
34+
35+
- name: Push to ModelScope
36+
env:
37+
MS_TOKEN: ${{ secrets.MS_TOKEN }}
38+
MS_REPO_TYPE: studios
39+
MS_REPO_ID: chenzihong/GraphGen
40+
run: |
41+
[[ -d ms-repo ]] && rm -rf ms-repo
42+
git clone https://oauth2:${MS_TOKEN}@www.modelscope.cn/${MS_REPO_TYPE}/${MS_REPO_ID}.git ms-repo
43+
44+
rsync -a --delete --exclude='.git' --exclude='ms-repo' --exclude='README.md' ./ ms-repo/
45+
46+
cd ms-repo
47+
git add .
48+
git diff-index --quiet HEAD || \
49+
(git commit -m "Auto-sync from ${{ inputs.ref }} at $(date -u)" && \
50+
git push "https://oauth2:${MS_TOKEN}@www.modelscope.cn/${MS_REPO_TYPE}/${MS_REPO_ID}.git")

.github/workflows/pylint.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
runs-on: ubuntu-latest
1414
strategy:
1515
matrix:
16-
python-version: ["3.10", "3.11"]
16+
python-version: ["3.10", "3.11", "3.12"]
1717

1818
steps:
1919
- uses: actions/checkout@v4

.github/workflows/sync-demo.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,3 +87,10 @@ jobs:
8787
uses: ./.github/workflows/push-to-hf.yml
8888
secrets:
8989
HF_TOKEN: ${{ secrets.HF_TOKEN }}
90+
push-ms:
91+
needs: sync-demo
92+
uses: ./.github/workflows/push-to-ms.yml
93+
secrets:
94+
MS_TOKEN: ${{ secrets.MS_TOKEN }}
95+
with:
96+
ref: demo

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
[![Hugging Face](https://img.shields.io/badge/Paper-on%20HF-white?logo=huggingface&logoColor=yellow)](https://huggingface.co/papers/2505.20416)
1515

1616
[![Hugging Face](https://img.shields.io/badge/Demo-on%20HF-blue?logo=huggingface&logoColor=yellow)](https://huggingface.co/spaces/chenzihong/GraphGen)
17+
[![Model Scope](https://img.shields.io/badge/%F0%9F%A4%96%20Demo-on%20MS-green)](https://modelscope.cn/studios/chenzihong/GraphGen)
1718
[![OpenXLab](https://img.shields.io/badge/Demo-on%20OpenXLab-blue?logo=openxlab&logoColor=yellow)](https://g-app-center-120612-6433-jpdvmvp.openxlab.space)
1819

1920

@@ -60,6 +61,7 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL
6061

6162
## 📌 Latest Updates
6263

64+
- **2025.09.29**: We auto-update gradio demo on [Hugging Face](https://huggingface.co/spaces/chenzihong/GraphGen) and [ModelScope](https://modelscope.cn/studios/chenzihong/GraphGen).
6365
- **2025.08.14**: We have added support for community detection in knowledge graphs using the Leiden algorithm, enabling the synthesis of Chain-of-Thought (CoT) data.
6466
- **2025.07.31**: We have added Google, Bing, Wikipedia, and UniProt as search back-ends.
6567
- **2025.04.21**: We have released the initial version of GraphGen.

README_HF.md

Lines changed: 0 additions & 43 deletions
This file was deleted.

README_ZH.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
[![Hugging Face](https://img.shields.io/badge/Paper-on%20HF-white?logo=huggingface&logoColor=yellow)](https://huggingface.co/papers/2505.20416)
1515

1616
[![Hugging Face](https://img.shields.io/badge/Demo-on%20HF-blue?logo=huggingface&logoColor=yellow)](https://huggingface.co/spaces/chenzihong/GraphGen)
17+
[![Model Scope](https://img.shields.io/badge/%F0%9F%A4%96%20Demo-on%20MS-green)](https://modelscope.cn/studios/chenzihong/GraphGen)
1718
[![OpenXLab](https://img.shields.io/badge/Demo-on%20OpenXLab-blue?logo=openxlab&logoColor=yellow)](https://g-app-center-120612-6433-jpdvmvp.openxlab.space)
1819

1920
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
@@ -61,6 +62,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
6162

6263
## 📌 最新更新
6364

65+
- **2025.09.29**:我们在 [Hugging Face](https://huggingface.co/spaces/chenzihong/GraphGen)[ModelScope](https://modelscope.cn/studios/chenzihong/GraphGen) 上自动更新 Gradio 应用。
6466
- **2025.08.14**:支持利用 Leiden 社区发现算法对知识图谱进行社区划分,合成 CoT 数据。
6567
- **2025.07.31**:新增 Google、Bing、Wikipedia 和 UniProt 作为搜索后端,帮助填补数据缺口。
6668
- **2025.04.21**:发布 GraphGen 初始版本。

graphgen/generate.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,6 @@
1616

1717
def set_working_dir(folder):
1818
os.makedirs(folder, exist_ok=True)
19-
os.makedirs(os.path.join(folder, "data", "graphgen"), exist_ok=True)
20-
os.makedirs(os.path.join(folder, "logs"), exist_ok=True)
2119

2220

2321
def save_config(config_path, global_config):
@@ -48,24 +46,27 @@ def main():
4846
args = parser.parse_args()
4947

5048
working_dir = args.output_dir
51-
set_working_dir(working_dir)
5249

5350
with open(args.config_file, "r", encoding="utf-8") as f:
5451
config = yaml.load(f, Loader=yaml.FullLoader)
5552

5653
output_data_type = config["output_data_type"]
5754
unique_id = int(time.time())
55+
56+
output_path = os.path.join(
57+
working_dir, "data", "graphgen", f"{unique_id}_{output_data_type}"
58+
)
59+
set_working_dir(output_path)
60+
5861
set_logger(
59-
os.path.join(
60-
working_dir, "logs", f"graphgen_{output_data_type}_{unique_id}.log"
61-
),
62+
os.path.join(output_path, f"{unique_id}.log"),
6263
if_stream=True,
6364
)
6465
logger.info(
6566
"GraphGen with unique ID %s logging to %s",
6667
unique_id,
6768
os.path.join(
68-
working_dir, "logs", f"graphgen_{output_data_type}_{unique_id}.log"
69+
working_dir, "logs", f"{unique_id}_graphgen_{output_data_type}.log"
6970
),
7071
)
7172

@@ -94,8 +95,7 @@ def main():
9495
else:
9596
raise ValueError(f"Unsupported output data type: {output_data_type}")
9697

97-
output_path = os.path.join(working_dir, "data", "graphgen", str(unique_id))
98-
save_config(os.path.join(output_path, f"config-{unique_id}.yaml"), config)
98+
save_config(os.path.join(output_path, "config.yaml"), config)
9999
logger.info("GraphGen completed successfully. Data saved to %s", output_path)
100100

101101

graphgen/graphgen.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,13 @@ def __post_init__(self):
9999
self.working_dir, namespace="rephrase"
100100
)
101101
self.qa_storage: JsonListStorage = JsonListStorage(
102-
os.path.join(self.working_dir, "data", "graphgen", str(self.unique_id)),
103-
namespace=f"qa-{self.unique_id}",
102+
os.path.join(
103+
self.working_dir,
104+
"data",
105+
"graphgen",
106+
f"{self.unique_id}_{self.config['output_data_type']}",
107+
),
108+
namespace="qa",
104109
)
105110

106111
@async_to_sync_method

0 commit comments

Comments
 (0)