Skip to content

Commit 162198a

Browse files
committed
Merge branch 'develop' into add_nlp_samples_3
2 parents 94205f2 + 24567e3 commit 162198a

File tree

238 files changed

+3360
-316
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

238 files changed

+3360
-316
lines changed

README.md

Lines changed: 68 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,145 +1,143 @@
11
# GraphNet ![](https://img.shields.io/badge/version-v0.1-brightgreen) ![](https://img.shields.io/github/issues/PaddlePaddle/GraphNet?label=open%20issues) [![](https://img.shields.io/badge/Contribute%20to%20GraphNet-blue)](https://github.com/PaddlePaddle/GraphNet/issues/98)
22

3+
**GraphNet** is a large-scale dataset of deep learning **computation graphs**, built as a standard benchmark for **tensor compiler** optimization. It provides 2.7K computation graphs extracted from state-of-the-art deep learning models spanning diverse tasks and ML frameworks. With standardized formats and rich metadata, GraphNet enables fair comparison and reproducible evaluation of the general optimization capabilities of tensor compilers, thereby supporting advanced research such as AI for System on compilers (AI for Compiler).
34

4-
**GraphNet** is a large-scale dataset of deep learning **computation graphs**, built as a standard benchmark for **tensor compiler** optimization. It provides 2.7K computation graphs extracted from state-of-the-art deep learning models spanning diverse tasks and ML frameworks. With standardized formats and rich metadata, GraphNet enables fair comparison, reproducible evaluation, and deeper research into the general optimization capabilities of tensor compilers.
55
<br>
66
<div align="center">
7-
<img src="/pics/graphnet_overview.jpg" alt="GraphNet Architecture Overview" width="65%">
7+
<img src="/pics/Eval_result.png" alt="Violin plots of speedup distributions" width="65%">
88
</div>
99

10-
With GraphNet, users can:
11-
1. **Contribute new computation graphs** through the built-in automated extraction and validation pipeline.
12-
2. **Evaluate tensor compilers** on existing graphs with the integrated compiler evaluation tool, supporting multiple compiler backends.
13-
3. **Advance research** in tensor compiler optimization using the test data and statistics provided by GraphNet.
14-
15-
16-
17-
18-
**Vision**: We aim to achieve cross-hardware portability of compiler optimizations by allowing models to learn and transfer optimization strategies. It will significantly reduce the manual effort required to develop efficient operator implementations.
19-
10+
Compiler developers can use GraphNet samples to evaluate tensor compilers (e.g., CINN, TorchInductor, TVM) on target tasks. The figure above shows the speedup of two compilers (CINN and TorchInductor) across two tasks (CV and NLP).
2011

21-
## Dataset Construction
12+
## 🧱 Dataset Construction
2213

2314
To guarantee the dataset’s overall quality, reproducibility, and cross-compiler compatibility, we define the following construction **constraints**:
2415

25-
1. Dynamic graphs must execute correctly.
26-
2. Graphs and their corresponding Python code must support serialization and deserialization.
16+
1. Computation graphs must be executable in imperative (eager) mode.
17+
2. Computation graphs and their corresponding Python code must support serialization and deserialization.
2718
3. The full graph can be decomposed into two disjoint subgraphs.
2819
4. Operator names within each computation graph must be statically parseable.
2920
5. If custom operators are used, their implementation code must be fully accessible.
3021

31-
3222
### Graph Extraction & Validation
33-
For full implementation details, please refer to the [Co-Creation Tutorial](https://github.com/PaddlePaddle/GraphNet/blob/develop/CONTRIBUTE_TUTORIAL.md#co-creation-tutorial).
23+
24+
We provide automated extraction and validation tools for constructing this dataset.
25+
26+
<div align="center">
27+
<img src="/pics/graphnet_overview.jpg" alt="GraphNet Architecture Overview" width="65%">
28+
</div>
3429

3530
**Demo: Extract & Validate ResNet‑18**
36-
```
31+
```bash
3732
git clone https://github.com/PaddlePaddle/GraphNet.git
3833
cd GraphNet
3934

4035
# Set your workspace directory
41-
export GRAPH_NET_EXTRACT_WORKSPACE=/home/yourname/graphnet_workspace
36+
export GRAPH_NET_EXTRACT_WORKSPACE=/home/yourname/graphnet_workspace/
4237

4338
# Extract the ResNet‑18 computation graph
4439
python graph_net/test/vision_model_test.py
4540

46-
# Validate the extracted graph (e.g. /home/yourname/graphnet_workspace/resnet18)
41+
# Validate the extracted graph (e.g. /home/yourname/graphnet_workspace/resnet18/)
4742
python -m graph_net.torch.validate \
48-
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/resnet18
43+
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/resnet18/
4944
```
5045

51-
**graph_net.torch.extract**
46+
**Illustration: How does GraphNet extract and construct a computation graph sample on PyTorch?**
47+
48+
<div align="center">
49+
<img src="/pics/graphnet_sample.png" alt="GraphNet Extract Sample" width="65%">
50+
</div>
51+
52+
* Source code of custom_op is required **only when** corresponding operator is used in the module, and **no specific format** is required.
53+
54+
**Step 1: graph_net.torch.extract**
5255

53-
```python
56+
Import and wrap the model with `graph_net.torch.extract(name=model_name, dynamic=dynamic_mode)()` is all you need:
57+
58+
```bash
5459
import graph_net
5560

5661
# Instantiate the model (e.g. a torchvision model)
5762
model = ...
5863

5964
# Extract your own model
60-
model = graph_net.torch.extract(name="model_name")(model)
61-
62-
# After running, the extracted graph will be saved to:
63-
# $GRAPH_NET_EXTRACT_WORKSPACE/model_name
65+
model = graph_net.torch.extract(name="model_name", dynamic="True")(model)
6466
```
6567

66-
For details, see docstring of `graph_net.torch.extract` defined in `graph_net/torch/extractor.py`
67-
68-
**graph_net.torch.validate**
69-
```
70-
# Verify that the extracted model meets requirements
71-
python -m graph_net.torch.validate \
72-
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/model_name
73-
```
68+
After running, the extracted graph will be saved to: `$GRAPH_NET_EXTRACT_WORKSPACE/model_name/`.
7469

70+
For more details, see docstring of `graph_net.torch.extract` defined in `graph_net/torch/extractor.py`.
7571

76-
## Compiler Evaluation
72+
**Step 2: graph_net.torch.validate**
7773

78-
The compiler evaluation process takes a GraphNet sample as input and involves:
79-
1. Running the original model in eager mode to record a baseline.
80-
2. Compiling the model with the specified backend (e.g., CINN, TorchInductor, TVM).
81-
3. Executing the compiled model and collecting its runtime and outputs.
82-
4. Analyzing performance by comparing the compiled results against the baseline.
74+
To verify that the extracted model meets requirements, we use `graph_net.torch.validate` in CI tool and also ask contributors to self-check in advance:
8375

84-
### Evaluation Metrics
76+
```bash
77+
python -m graph_net.torch.validate \
78+
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/model_name
79+
```
8580

86-
We define two key metrics here: **rectified speedup** and **GraphNet Score**. Rectified speedup measures runtime performance while incorporating compilation success, time cost, and correctness. GraphNet Score aggregates the rectified speedup of a compiler on specified tasks, providing a measure of its general optimization capability.
81+
All the **construction constraints** will be examined automatically. After passing validation, a unique `graph_hash.txt` will be generated and later checked in CI procedure to avoid redundant.
8782

88-
**Demo: How to benchmark your compiler on the model:**
83+
## ⚖️ Compiler Evaluation
8984

90-
1. Benchmark
85+
**Step 1: Benchmark**
9186

92-
We use ```graph_net/benchmark_demo.sh``` to benchmark GraphNet computation graph samples:
87+
We use `graph_net/benchmark_demo.sh` to benchmark GraphNet computation graph samples:
9388

94-
```
89+
```bash
9590
bash graph_net/benchmark_demo.sh &
9691
```
9792

98-
The script will run ```graph_net.torch.test_compiler``` with specific batch and log configurations.
93+
The script runs `graph_net.torch.test_compiler` with specific batch and log configurations.
9994

100-
Or you can customize and use ```graph_net.torch.test_compiler``` yourself:
95+
Or you can customize and use `graph_net.torch.test_compiler` yourself:
10196

102-
```
103-
python3 -m graph_net.torch.test_compiler \
97+
```bash
98+
python -m graph_net.torch.test_compiler \
10499
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/model_name/ \
105-
--compiler /path/to/custom/compiler/ \
100+
--compiler /custom/or/builtin/compiler/ \
101+
--warmup /times/to/warmup/ \
102+
--trials /times/to/test/ \
103+
--device /device/to/execute/ \
106104
--output-dir /path/to/save/JSON/result/file/
105+
107106
# Note: if --compiler is omitted, PyTorch’s built-in compiler is used by default
108107
```
109108

110-
2. Analysis
109+
After executing, `graph_net.torch.test_compiler` will:
110+
1. Running the original model in eager mode to record a baseline.
111+
2. Compiling the model with the specified backend (e.g., CINN, TVM, Inductor, TensorRT, XLA, BladeDISC).
112+
3. Executing the compiled model and collecting its runtime and outputs.
113+
4. Conduct speedup by comparing the compiled results against the baseline.
111114

112-
After processing, we provide ```graph_net/analysis.py``` to generate [violin plot](https://en.m.wikipedia.org/wiki/Violin_plot) based on the JSON results.
115+
**Step 2: Analysis**
113116

114-
```
115-
python3 graph_net/analysis.py \
117+
After processing, we provide `graph_net/analysis.py` to generate [violin plot](https://en.m.wikipedia.org/wiki/Violin_plot) based on the JSON results.
118+
119+
```bash
120+
python -m graph_net.analysis \
116121
--benchmark-path /path/to/read/JSON/result/file/ \
117122
--output-dir /path/to/save/output/figures/
118123
```
119124

120-
After executing, one summary plot of results on all compilers (as shown below in "Evaluation Results Example"), as well as multiple sub-plots of results in categories (model tasks, Library...) on a single compiler.
121-
122-
The script is designed to process a file structure as ```/benchmark_path/compiler_name/category_name/``` (for example ```/benchmark_logs/paddle/nlp/```), and items on x-axis are identified by name of the folders. So you can modify ```read_all_speedups``` function to fit the benchmark settings on your demand.
125+
After executing, one summary plot of results on all compilers, as well as multiple sub-plots of results in categories (model tasks, Library...) on a single compiler will be exported.
123126

124-
### Evaluation Results Example
125-
126-
<div align="center">
127-
<img src="/pics/Eval_result.png" alt="Violin plots of rectified speedup distributions" width="65%">
128-
</div>
127+
The script is designed to process a file structure as `/benchmark_path/compiler_name/category_name/` (for example `/benchmark_logs/paddle/nlp/`), and items on x-axis are identified by name of the folders. So you can modify `read_all_speedups` function to fit the benchmark settings on your demand.
129128

130-
131-
## Roadmap
129+
## 📌 Roadmap
132130

133131
1. Scale GraphNet to 10K+ graphs.
134132
2. Further annotate GraphNet samples into more granular sub-categories
135133
3. Extract samples from multi-GPU scenarios to support benchmarking and optimization for large-scale, distributed computing.
136134
4. Enable splitting full graphs into independently optimized subgraphs and operator sequences.
137135

138-
## GraphNet Community:
139-
136+
**Vision**: GraphNet aims to lay the foundation for AI for Compiler by enabling **large-scale, systematic evaluation** of tensor compiler optimizations, and providing a **dataset for models to learn** and transfer optimization strategies.
140137

141-
You can join GraphNet community via the following group chats.
138+
## 💬 GraphNet Community
142139

140+
You can join our community via following group chats. Welcome to ask any questions about using and building GraphNet.
143141

144142
<div align="center">
145143
<table>
@@ -155,8 +153,5 @@ You can join GraphNet community via the following group chats.
155153
</table>
156154
</div>
157155

158-
159-
160-
## License
156+
## 🪪 License
161157
This project is released under the [MIT License](LICENSE).
162-

graph_net/paddle/validate.py

Lines changed: 39 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
import numpy as np
1212
import graph_net
1313
import os
14-
import re
14+
import ast
1515
import paddle
1616

1717

@@ -29,29 +29,49 @@ def _get_sha_hash(content):
2929
return m.hexdigest()
3030

3131

32-
def _save_to_model_path(dump_dir, hash_text):
33-
file_path = f"{dump_dir}/graph_hash.txt"
34-
with open(file_path, "w") as f:
35-
f.write(hash_text)
32+
def _extract_forward_source(model_path):
33+
source = None
34+
with open(f"{model_path}/model.py", "r") as f:
35+
source = f.read()
3636

37+
tree = ast.parse(source)
38+
forward_code = None
3739

38-
def extract_from_forward_regex(text, case_sensitive=True):
39-
pattern = r"forward.*"
40-
flags = 0 if case_sensitive else re.IGNORECASE
40+
for node in tree.body:
41+
if isinstance(node, ast.ClassDef) and node.name == "GraphModule":
42+
for fn in node.body:
43+
if isinstance(fn, ast.FunctionDef) and fn.name == "forward":
44+
return ast.unparse(fn)
45+
return None
4146

42-
match = re.search(pattern, text, flags)
43-
if match:
44-
return match.group(0)
47+
48+
def check_graph_hash(args):
49+
model_path = args.model_path
50+
file_path = f"{model_path}/graph_hash.txt"
51+
if args.dump_graph_hash_key:
52+
model_str = _extract_forward_source(model_path)
53+
assert model_str is not None, f"model_str of {args.model_path} is None."
54+
new_hash_text = _get_sha_hash(model_str)
55+
56+
old_hash_text = None
57+
if os.path.exists(file_path):
58+
with open(file_path, "r") as f:
59+
old_hash_text = f.read()
60+
61+
if old_hash_text is None or new_hash_text != old_hash_text:
62+
print(f"Writing to {file_path}.")
63+
with open(file_path, "w") as f:
64+
f.write(new_hash_text)
65+
if old_hash_text is not None:
66+
assert (
67+
new_hash_text == old_hash_text
68+
), f"Hash value for {model_path} is not consistent."
4569
else:
46-
raise ValueError("Erroneous case occurs.")
70+
assert os.path.exists(file_path), f"{file_path} does not exist."
4771

4872

4973
def main(args):
50-
model_path = args.model_path
51-
with open(f"{model_path}/model.py", "r") as fp:
52-
model_str = fp.read()
53-
model_str = extract_from_forward_regex(model_str)
54-
_save_to_model_path(model_path, _get_sha_hash(model_str))
74+
check_graph_hash(args)
5575

5676
model_path = args.model_path
5777
model_class = load_class_from_file(
@@ -100,17 +120,16 @@ def main(args):
100120
required=True,
101121
help="Path to folder e.g '../test_dataset'",
102122
)
103-
104123
parser.add_argument(
105124
"--no-check-redundancy",
106125
action="store_true",
126+
default=False,
107127
help="whether check model graph redundancy",
108128
)
109-
110129
parser.add_argument(
111130
"--dump-graph-hash-key",
112131
action="store_true",
113-
default=False,
132+
default=True,
114133
help="Dump graph hash key",
115134
)
116135
parser.add_argument(
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
c1e7e52eab55414cee7c44a9e8c4f81bbd59e3837b185e179e6317efa04f69ec
1+
f2b5a332b1b19703e7ccfb450de96c9c12244144c7b9d305d20587f772fb6672
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
c1e7e52eab55414cee7c44a9e8c4f81bbd59e3837b185e179e6317efa04f69ec
1+
f2b5a332b1b19703e7ccfb450de96c9c12244144c7b9d305d20587f772fb6672
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
c1e7e52eab55414cee7c44a9e8c4f81bbd59e3837b185e179e6317efa04f69ec
1+
f2b5a332b1b19703e7ccfb450de96c9c12244144c7b9d305d20587f772fb6672
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
c1e7e52eab55414cee7c44a9e8c4f81bbd59e3837b185e179e6317efa04f69ec
1+
f2b5a332b1b19703e7ccfb450de96c9c12244144c7b9d305d20587f772fb6672
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
c1e7e52eab55414cee7c44a9e8c4f81bbd59e3837b185e179e6317efa04f69ec
1+
f2b5a332b1b19703e7ccfb450de96c9c12244144c7b9d305d20587f772fb6672
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1bad8e4fab570ff456bad864ef45a755f07b2e466cced7983a8383abccc8fc7a
1+
02fa10efca360c8ba7818c367cdeb9979e2af8c72cf489913396a1f241bbad07
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1bad8e4fab570ff456bad864ef45a755f07b2e466cced7983a8383abccc8fc7a
1+
02fa10efca360c8ba7818c367cdeb9979e2af8c72cf489913396a1f241bbad07
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
c1e7e52eab55414cee7c44a9e8c4f81bbd59e3837b185e179e6317efa04f69ec
1+
f2b5a332b1b19703e7ccfb450de96c9c12244144c7b9d305d20587f772fb6672

0 commit comments

Comments
 (0)