Skip to content

Commit fd011a0

Browse files
[Fea] Support code trace (#1179)
* support code snapshot tracing * update doc
1 parent 6c79832 commit fd011a0

File tree

4 files changed

+112
-5
lines changed

4 files changed

+112
-5
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ PaddleScience 是一个基于深度学习框架 PaddlePaddle 开发的科学计
150150
<!-- --8<-- [start:feature] -->
151151
## ✨特性
152152

153-
- **支持自动化并行实验调度,一键串/并行启动实验任务([教程](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/user_guide/#113))**,提高科研效率。
153+
- 支持**[实验源码跟踪](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/user_guide/#112)[一键启动并行实验](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/user_guide/#114)**,提高科研效率。
154154
- 支持简单几何和复杂 STL 几何的采样与布尔运算。
155155
- 支持包括 Dirichlet、Neumann、Robin 以及自定义边界条件。
156156
- 支持物理机理驱动、数据驱动、数理融合三种问题求解方式。涵盖流体、结构、气象等领域 20+ 案例。

docs/zh/user_guide.md

Lines changed: 49 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,52 @@ EVAL:
4545
sup_validator: 128
4646
```
4747

48-
#### 1.1.2 命令行方式配置参数
48+
#### 1.1.2 保存实验代码快照⭐
49+
50+
尽管我们提供了以 hydra 和 Omegaconf 为基础的运行配置系统,但除了配置文件外还可能会涉及修改源代码,这同样会导致实验代码版本混乱,难以追踪。
51+
52+
为了解决这一问题,PaddleScience 提供了代码差异跟踪功能,通过在运行命令的末尾加上:`trace=True`,就能够自动将当前代码快照保存到 `output_dir/code_snapshot/uncommitted.diff` 文件中,便于后续追踪和复现。
53+
54+
`allen_cahn_piratenet.py` 为例,首先确认 Python 环境中安装了 GitPython 包
55+
56+
``` sh
57+
python -m pip install GitPython
58+
```
59+
60+
然后在运行命令的末尾加上 `trace=True` 参数
61+
62+
``` sh
63+
python allen_cahn_piratenet.py trace=True
64+
```
65+
66+
则其打印的日志如下
67+
68+
``` log hl_lines="1-8"
69+
ppsci MESSAGE: [Code Trace] Git Information:
70+
ppsci MESSAGE: Branch : support_code_trace
71+
ppsci MESSAGE: Commit : 5ea90ae584b7fff17ff5aa385ba5abb6c04c268c
72+
ppsci MESSAGE: Date : 2025-06-24T20:48:07+08:00
73+
ppsci MESSAGE: Dirty : True
74+
ppsci INFO: [Code Trace] Staged changes saved to: outputs_allen_cahn_piratenet/2025-07-02/20-13-46/code_snapshot/staged.diff
75+
ppsci INFO: [Code Trace] To restore your code to this staged version, run: git apply outputs_allen_cahn_piratenet/2025-07-02/20-13-46/code_snapshot/staged.diff
76+
ppsci INFO: [Code Trace] Unstaged changes saved to: outputs_allen_cahn_piratenet/2025-07-02/20-13-46/code_snapshot/unstaged.diff
77+
ppsci INFO: [Code Trace] To restore your code to this unstaged version, run: git apply outputs_allen_cahn_piratenet/2025-07-02/20-13-46/code_snapshot/unstaged.diff
78+
W0702 20:13:46.390472 37150 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.6
79+
ppsci MESSAGE: 'shuffle' and 'drop_last' are both set to False in default as sampler config is not specified.
80+
ppsci INFO: Auto collation is disabled and set num_workers to 0 to speed up batch sampling.
81+
ppsci INFO: Using paddlepaddle develop(f701bb1) on device Place(gpu:0)
82+
ppsci MESSAGE: Set to_static=False for computational optimization.
83+
...
84+
```
85+
86+
可以看到开启此时会在日志中打印出更为详细的代码版本信息,并自动将当前代码快照保存到 `output_dir/code_snapshot/*.diff` 文件中,可通过 `git apply` 命令将实验代码恢复到快照版本。
87+
88+
!!! note "注意事项"
89+
90+
- 如果需要跟踪新增文件,需先使用 `git add` 将新增文件添加到暂存区后才能被跟踪。
91+
- 使用本功能需确保当前开发的代码库为 git 仓库,且当前代码库中存在 `.git` 文件夹,否则无法跟踪。
92+
93+
#### 1.1.3 命令行方式配置参数
4994

5095
仍然以配置文件 `bracket.yaml` 为例,关于学习率部分的参数配置如下所示。
5196

@@ -94,7 +139,7 @@ TRAIN:
94139
# python example.py PATH="/workspace/lr=0.1,s=[3]/best_model.pdparams"
95140
```
96141
97-
#### 1.1.3 自动化运行实验
142+
#### 1.1.4 自动化运行实验
98143
99144
如 [1.1.2 命令行方式配置参数](#112) 所述,可以通过在程序执行命令的末尾加上合适的参数来控制多组实验的运行配置,接下来以自动化执行四组实验为例,介绍如何利用 hydra 的 [multirun](https://hydra.cc/docs/1.0/tutorials/basic/running_your_app/multi-run/#internaldocs-banner) 功能,实现该目的。
100145
@@ -675,7 +720,7 @@ PaddleScience 提供了多种推理配置组合,可通过命令行进行组合
675720
solver.eval()
676721
```
677722
678-
### 1.7 实验过程可视化
723+
### 1.7 实验过程可视化
679724
680725
=== "TensorBoardX"
681726
@@ -945,7 +990,7 @@ best_value: 0.02460772916674614
945990
946991
### 2.2 分布式训练
947992
948-
#### 2.2.1 数据并行
993+
#### 2.2.1 数据并行
949994
950995
接下来以 `examples/pipe/poiseuille_flow.py` 为例,介绍如何正确使用 PaddleScience 的数据并行功能进行训练。分布式训练细节可以参考:[Paddle-使用指南-分布式训练-快速开始-数据并行](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/06_distributed_training/cluster_quick_start_collective_cn.html)。
951996

ppsci/utils/callbacks.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
import importlib.util
1616
import inspect
17+
import os
1718
import sys
1819
import traceback
1920
from os import path as osp
@@ -134,3 +135,62 @@ def on_job_start(self, config: DictConfig, **kwargs: Any) -> None:
134135
core.set_prim_eager_enabled(True)
135136
core._set_prim_all_enabled(True)
136137
logger.message("Prim mode is enabled.")
138+
139+
# === Optionally log git info & dump uncommitted diff ===
140+
if bool(full_cfg.get("trace", False)):
141+
if not importlib.util.find_spec("git"):
142+
logger.error(
143+
"[Code Trace] GitPython is required for trace=True.\n"
144+
"Please install it with: pip install GitPython"
145+
)
146+
sys.exit(RUNTIME_EXIT_CODE)
147+
148+
from git import InvalidGitRepositoryError
149+
from git import Repo
150+
151+
try:
152+
repo = Repo(".", search_parent_directories=True)
153+
branch = repo.active_branch.name
154+
commit = repo.head.commit
155+
commit_hash = commit.hexsha
156+
commit_time = commit.committed_datetime.isoformat()
157+
is_dirty = repo.is_dirty()
158+
159+
logger.message("[Code Trace] Git Information:")
160+
logger.message(f" Branch : {branch}")
161+
logger.message(f" Commit : {commit_hash}")
162+
logger.message(f" Date : {commit_time}")
163+
logger.message(f" Dirty : {is_dirty}")
164+
165+
if is_dirty:
166+
trace_dir = osp.join(full_cfg.output_dir, "code_snapshot")
167+
os.makedirs(trace_dir, exist_ok=True)
168+
169+
staged_diff = repo.git.diff("--cached")
170+
if len(staged_diff) > 0:
171+
staged_diff_path = osp.join(trace_dir, "staged.diff")
172+
with open(staged_diff_path, "w", encoding="utf-8") as f:
173+
f.write(staged_diff)
174+
logger.info(
175+
f"[Code Trace] Staged changes saved to: {staged_diff_path}"
176+
)
177+
logger.info(
178+
f"[Code Trace] To restore your code to this staged version, run: git apply {staged_diff_path}"
179+
)
180+
181+
unstaged_diff = repo.git.diff()
182+
if len(unstaged_diff) > 0:
183+
unstaged_diff_path = osp.join(trace_dir, "unstaged.diff")
184+
with open(unstaged_diff_path, "w", encoding="utf-8") as f:
185+
f.write(unstaged_diff)
186+
logger.info(
187+
f"[Code Trace] Unstaged changes saved to: {unstaged_diff_path}"
188+
)
189+
logger.info(
190+
f"[Code Trace] To restore your code to this unstaged version, run: git apply {unstaged_diff_path}"
191+
)
192+
193+
except InvalidGitRepositoryError:
194+
logger.warning("[Code Trace] Not a Git repository. Skipping.")
195+
except Exception as e:
196+
logger.warning(f"[Code Trace] Unexpected error: {e}")

ppsci/utils/config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,7 @@ class SolverConfig(BaseModel):
311311
to_static: bool = False
312312
prim: bool = False
313313
log_level: Literal["debug", "info", "warning", "error"] = "info"
314+
trace: bool = False
314315

315316
# Training related config
316317
TRAIN: Optional[TrainConfig] = None
@@ -408,6 +409,7 @@ def use_wandb_check(cls, v, info: ValidationInfo):
408409
"to_static",
409410
"prim",
410411
"log_level",
412+
"trace",
411413
"TRAIN.save_freq",
412414
"TRAIN.eval_during_train",
413415
"TRAIN.start_eval_epoch",

0 commit comments

Comments
 (0)