Skip to content

Commit 67eab96

Browse files
MOLYHECISunnyHaze
andauthored
[update] add en version dev docs (#13)
* [update] add en version dev docs * [update] fix one bug in storage_info.md * [debug] rendering issue and sidebar issue solved * [update] fix en doc issues --------- Co-authored-by: Ma, Xiaochen <mxch1122@126.com>
1 parent 981d70c commit 67eab96

File tree

6 files changed

+190
-180
lines changed

6 files changed

+190
-180
lines changed

docs/en/notes/dev_guide/logging.md

Lines changed: 52 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,76 @@
11
---
22
title: logging
33
createTime: 2025/06/09 11:39:11
4-
permalink: /en/article/25rhx6ij/
4+
permalink: /en/dev_guide/logging/
55
---
6+
67
## Logger
78

8-
目前logger的初始化在pipeline_step.py中
9+
Currently, the logger is initialized in `pipeline_step.py`:
10+
911
```python
1012
import logging
1113
logging.basicConfig(level=logging.INFO,
1214
format="%(asctime)s | %(filename)-20s- %(module)-20s- %(funcName)-20s- %(lineno)5d - %(name)-10s | %(levelname)8s | Processno %(process)5d - Threadno %(thread)-15d : %(message)s",
1315
datefmt="%Y-%m-%d %H:%M:%S"
1416
)
1517
```
16-
使用方法如下所示,其中debug, info, warning, error代表不同的日志等级,默认情况下DEBUG等级的日志不会显示。
18+
19+
Usage is as follows. `debug`, `info`, `warning`, and `error` represent different log levels. By default, logs at the DEBUG level are not shown.
20+
1721
```python
1822
def main():
19-
23+
2024
logging.debug("This is DEBUG message")
2125
logging.info("This is INFO message")
2226
logging.warning("This is WARNING message")
2327
logging.error("This is ERROR message")
24-
28+
2529
return
2630

2731
main()
2832
```
29-
关于等级的分配原则:
30-
1. DEBUG:一些没什么用需要屏蔽的输出 / 不想展示的技术细节,如:
31-
```python
32-
for x in ['Text', 'image', 'video']:
33-
module_path = "dataflow.Eval." + x
34-
try:
35-
module_lib = importlib.import_module(module_path)
36-
clss = getattr(module_lib, name)
37-
self._obj_map[name] = clss
38-
return clss
39-
except AttributeError as e:
40-
logging.debug(f"{str(e)}")
41-
continue
42-
except Exception as e:
43-
raise e
44-
```
45-
2. INFO: 让用户得知目前的运行情况,如:
46-
```python
47-
def pipeline_step(yaml_path, step_name, step_type):
48-
import logging
49-
import yaml
50-
logging.info(f"Loading yaml {yaml_path} ......")
51-
with open(yaml_path, "r") as f:
52-
config = yaml.safe_load(f)
53-
config = merge_yaml(config)
54-
logging.info(f"Load yaml success, config: {config}")
55-
if step_type == "process":
56-
algorithm = get_processor(step_name, config)
57-
elif step_type == "generator":
58-
algorithm = get_generator(step_name, config)
59-
logging.info("Start running ...")
60-
algorithm.run()
61-
```
62-
3. WARNING:可能出现问题的错误信息(暂时没有例子)
63-
4. ERROR:运行出现错误,打印错误信息
6433

65-
算子内部的logging可以参考`DataFlow/dataflow/generator/algorithms/TreeSitterParser.py`
34+
Principles for assigning log levels:
35+
36+
1. **DEBUG**: Outputs that are not very useful or should be hidden / technical details you don’t want to expose, such as:
37+
38+
```python
39+
for x in ['Text', 'image', 'video']:
40+
module_path = "dataflow.Eval." + x
41+
try:
42+
module_lib = importlib.import_module(module_path)
43+
clss = getattr(module_lib, name)
44+
self._obj_map[name] = clss
45+
return clss
46+
except AttributeError as e:
47+
logging.debug(f"{str(e)}")
48+
continue
49+
except Exception as e:
50+
raise e
51+
```
52+
53+
2. **INFO**: Used to let users know the current execution status, such as:
54+
55+
```python
56+
def pipeline_step(yaml_path, step_name, step_type):
57+
import logging
58+
import yaml
59+
logging.info(f"Loading yaml {yaml_path} ......")
60+
with open(yaml_path, "r") as f:
61+
config = yaml.safe_load(f)
62+
config = merge_yaml(config)
63+
logging.info(f"Load yaml success, config: {config}")
64+
if step_type == "process":
65+
algorithm = get_processor(step_name, config)
66+
elif step_type == "generator":
67+
algorithm = get_generator(step_name, config)
68+
logging.info("Start running ...")
69+
algorithm.run()
70+
```
71+
72+
3. **WARNING**: Error messages indicating potential issues (no examples for now).
73+
74+
4. **ERROR**: Errors that occur during execution; used to print error messages.
75+
76+
For logging inside operators, refer to `DataFlow/dataflow/generator/algorithms/TreeSitterParser.py`.
Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,23 @@
11
---
2-
title: 新算子
2+
title: New Operator
33
createTime: 2025/06/12 12:00:00
44
permalink: /en/dev_guide/new_algo/
55
---
66

77
## New Algorithm
88

9-
DataFlow算子在具体实现上分为两种
10-
1. 有统一基类的算子,位于``dataflow/Eval````dataflow/process``下,这些算子需要实现基类所需的固定方法(如``__init__()``, ``evaluate_batch()``, ``filter_func()``等)
9+
DataFlow operators are implemented in two forms:
1110

12-
2. 没有统一基类的算子,位于``dataflow/generator/algorithm``文件夹,这些算子必须实现``__init__````run()``方法。
11+
1. Operators with a unified base class, located under `dataflow/Eval` or `dataflow/process`. These operators are required to implement specific methods defined by the base class, such as `__init__()`, `evaluate_batch()`, `filter_func()`, etc.
1312

14-
如果想要在DataFlow中添加新算子,在实现算子及其所包含的方法后需要进行如下操作:
13+
2. Operators without a unified base class, located in the `dataflow/generator/algorithm` directory. These operators must implement the `__init__` and `run()` methods.
1514

16-
1. 在算子所在文件夹下添加包含算子类的文件
15+
To add a new operator to DataFlow, follow these steps after implementing the operator and its required methods:
1716

18-
2. 在该文件中导入Registry实例,并用``register()``方法修饰
17+
1. Add a new file containing the operator class under the appropriate directory.
1918

20-
3. 在算子所在文件夹下的__init__.py文件中,向``_import_structure``变量添加算子指向的相对路径。
19+
2. Import the `Registry` instance in that file and decorate the operator class using the `register()` method.
2120

22-
如果有必要添加新的算子文件夹,需要在``dataflow/utils/registry.py``中进行相应修改。
21+
3. In the `__init__.py` file of the operator's directory, add the relative path to the operator file in the `_import_structure` variable.
22+
23+
If you need to add a new operator directory, you must also modify `dataflow/utils/registry.py` accordingly.
Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,39 @@
11
---
2-
title: Pull Request规范
2+
title: Pull Request Guidelines
33
createTime: 2025/06/13 10:42:46
44
permalink: /en/dev_guide/pull_request/
55
---
6-
# Pull Request 规范
76

8-
开发者在准备开发前,请fork本仓库到自己的账号下,并git clone 到本地。
7+
# Pull Request Guidelines
98

10-
完成clone操作后,设置上游仓库为本仓库
9+
Before starting development, please fork this repository to your own GitHub account and clone it to your local machine.
10+
11+
After cloning, set the upstream repository to the main DataFlow repo:
1112

1213
```
1314
git remote add upstream <DataFlow URL>
1415
```
1516

16-
在提交修改前,请将仓库与上游仓库同步:
17+
Before submitting your changes, sync your local repository with the upstream repository:
1718

1819
```
1920
git pull upstream main
2021
```
21-
同步后进行提交:
22+
23+
After syncing, push your changes:
24+
2225
```
2326
git push origin main
2427
```
2528

26-
在提交修改到自己账户的fork仓库后,发起pull request
29+
Once your changes are pushed to your forked repository, create a pull request.
2730

28-
如果不熟悉git相关操作,请在进行操作时遵循如下注意事项:
31+
If you're not familiar with Git operations, please follow these guidelines:
2932

30-
1. ``git add`` : 对修改过的文件/文件夹逐个执行``git add``, 不要执行 ``git add .``。如果该命令运行时间过长,**极大概率是将大文件放入了暂存区**,请立即停止并进行检查。
33+
1. `git add`: Add modified files/folders one by one using `git add <file>`. **Do not use** `git add .`. If this command takes too long, it’s **very likely you’ve added large files to the staging area**. Stop immediately and check.
3134

32-
2. ``git commit`` : 提交信息尽量清晰准确。
35+
2. `git commit`: Make your commit messages as clear and accurate as possible.
3336

34-
3. ``git push``: 执行该指令时,请指明push到的远程仓库及分支,如果不熟悉git相关操作,请在push之前执行``git remote -v`` 查看每个远程仓库的别名对应的URL以明确。
37+
3. `git push`: When pushing, specify the remote repository and branch. If you're unsure, run `git remote -v` before pushing to confirm the URL of each remote alias.
3538

36-
4. 发起PR: 发起PR前请点击``Files changed``仔细检查是否有不应该提交的文件修改。
39+
4. Creating a PR: Before submitting a PR, click on `Files changed` and carefully check whether there are any files that should not be included in the submission.

0 commit comments

Comments
 (0)