OpenDCAI
diff --git a/‎docs/en/notes/dev_guide/logging.md‎
Lines changed: 52 additions & 41 deletions b/‎docs/en/notes/dev_guide/logging.md‎
Lines changed: 52 additions & 41 deletions
diff --git a/‎docs/en/notes/dev_guide/new_algo.md‎
Lines changed: 10 additions & 9 deletions b/‎docs/en/notes/dev_guide/new_algo.md‎
Lines changed: 10 additions & 9 deletions
diff --git a/‎docs/en/notes/dev_guide/pull_request.md‎
Lines changed: 15 additions & 12 deletions b/‎docs/en/notes/dev_guide/pull_request.md‎
Lines changed: 15 additions & 12 deletions
@@ -1,65 +1,76 @@
 ---
 title: logging
 createTime: 2025/06/09 11:39:11
-permalink: /en/article/25rhx6ij/
+permalink: /en/dev_guide/logging/
 ---
+
 ## Logger
 
-目前logger的初始化在pipeline_step.py中 
+Currently, the logger is initialized in `pipeline_step.py`:
+
 ```python
 import logging
 logging.basicConfig(level=logging.INFO,
     format="%(asctime)s | %(filename)-20s- %(module)-20s- %(funcName)-20s- %(lineno)5d - %(name)-10s | %(levelname)8s | Processno %(process)5d - Threadno %(thread)-15d : %(message)s", 
     datefmt="%Y-%m-%d %H:%M:%S"
     )
 ```
-使用方法如下所示，其中debug, info, warning, error代表不同的日志等级，默认情况下DEBUG等级的日志不会显示。
+
+Usage is as follows. `debug`, `info`, `warning`, and `error` represent different log levels. By default, logs at the DEBUG level are not shown.
+
 ```python
 def main():
-    
+
     logging.debug("This is DEBUG message")
     logging.info("This is INFO message")
     logging.warning("This is WARNING message")
     logging.error("This is ERROR message")
-    
+
     return
 
 main()
 ```
-关于等级的分配原则：
-1. DEBUG：一些没什么用需要屏蔽的输出 / 不想展示的技术细节，如：
-```python
-                for x in ['Text', 'image', 'video']:
-                    module_path = "dataflow.Eval." + x
-                    try:
-                        module_lib = importlib.import_module(module_path)
-                        clss = getattr(module_lib, name)
-                        self._obj_map[name] = clss
-                        return clss
-                    except AttributeError as e:
-                        logging.debug(f"{str(e)}")
-                        continue
-                    except Exception as e:
-                        raise e
-```
-2. INFO: 让用户得知目前的运行情况，如：
-```python
-def pipeline_step(yaml_path, step_name, step_type):
-    import logging
-    import yaml
-    logging.info(f"Loading yaml {yaml_path} ......")
-    with open(yaml_path, "r") as f:
-        config = yaml.safe_load(f)
-    config = merge_yaml(config)
-    logging.info(f"Load yaml success, config: {config}")
-    if step_type == "process":
-        algorithm = get_processor(step_name, config)
-    elif step_type == "generator":
-        algorithm = get_generator(step_name, config)
-    logging.info("Start running ...")
-    algorithm.run()
-```
-3. WARNING：可能出现问题的错误信息（暂时没有例子）
-4. ERROR：运行出现错误，打印错误信息
 
-算子内部的logging可以参考`DataFlow/dataflow/generator/algorithms/TreeSitterParser.py`
+Principles for assigning log levels:
+
+1. **DEBUG**: Outputs that are not very useful or should be hidden / technical details you don’t want to expose, such as:
+
+    ```python
+    for x in ['Text', 'image', 'video']:
+        module_path = "dataflow.Eval." + x
+        try:
+            module_lib = importlib.import_module(module_path)
+            clss = getattr(module_lib, name)
+            self._obj_map[name] = clss
+            return clss
+        except AttributeError as e:
+            logging.debug(f"{str(e)}")
+            continue
+        except Exception as e:
+            raise e
+    ```
+
+2. **INFO**: Used to let users know the current execution status, such as:
+
+    ```python
+    def pipeline_step(yaml_path, step_name, step_type):
+        import logging
+        import yaml
+        logging.info(f"Loading yaml {yaml_path} ......")
+        with open(yaml_path, "r") as f:
+            config = yaml.safe_load(f)
+        config = merge_yaml(config)
+        logging.info(f"Load yaml success, config: {config}")
+        if step_type == "process":
+            algorithm = get_processor(step_name, config)
+        elif step_type == "generator":
+            algorithm = get_generator(step_name, config)
+        logging.info("Start running ...")
+        algorithm.run()
+    ```
+
+3. **WARNING**: Error messages indicating potential issues (no examples for now).
+
+4. **ERROR**: Errors that occur during execution; used to print error messages.
+
+For logging inside operators, refer to `DataFlow/dataflow/generator/algorithms/TreeSitterParser.py`.
@@ -1,22 +1,23 @@
 ---
-title: 新算子
+title: New Operator
 createTime: 2025/06/12 12:00:00
 permalink: /en/dev_guide/new_algo/
 ---
 
 ## New Algorithm
 
-DataFlow算子在具体实现上分为两种
-1. 有统一基类的算子，位于``dataflow/Eval``或``dataflow/process``下，这些算子需要实现基类所需的固定方法（如``__init__()``, ``evaluate_batch()``, ``filter_func()``等）
+DataFlow operators are implemented in two forms:
 
-2. 没有统一基类的算子，位于``dataflow/generator/algorithm``文件夹，这些算子必须实现``__init__``和``run()``方法。
+1. Operators with a unified base class, located under `dataflow/Eval` or `dataflow/process`. These operators are required to implement specific methods defined by the base class, such as `__init__()`, `evaluate_batch()`, `filter_func()`, etc.
 
-如果想要在DataFlow中添加新算子，在实现算子及其所包含的方法后需要进行如下操作：
+2. Operators without a unified base class, located in the `dataflow/generator/algorithm` directory. These operators must implement the `__init__` and `run()` methods.
 
-1. 在算子所在文件夹下添加包含算子类的文件
+To add a new operator to DataFlow, follow these steps after implementing the operator and its required methods:
 
-2. 在该文件中导入Registry实例，并用``register()``方法修饰
+1. Add a new file containing the operator class under the appropriate directory.
 
-3. 在算子所在文件夹下的__init__.py文件中，向``_import_structure``变量添加算子指向的相对路径。
+2. Import the `Registry` instance in that file and decorate the operator class using the `register()` method.
 
-如果有必要添加新的算子文件夹，需要在``dataflow/utils/registry.py``中进行相应修改。
+3. In the `__init__.py` file of the operator's directory, add the relative path to the operator file in the `_import_structure` variable.
+
+If you need to add a new operator directory, you must also modify `dataflow/utils/registry.py` accordingly.
@@ -1,36 +1,39 @@
 ---
-title: Pull Request规范
+title: Pull Request Guidelines
 createTime: 2025/06/13 10:42:46
 permalink: /en/dev_guide/pull_request/
 ---
-# Pull Request 规范
 
-开发者在准备开发前，请fork本仓库到自己的账号下，并git clone 到本地。
+# Pull Request Guidelines
 
-完成clone操作后，设置上游仓库为本仓库
+Before starting development, please fork this repository to your own GitHub account and clone it to your local machine.
+
+After cloning, set the upstream repository to the main DataFlow repo:
 
 ```
 git remote add upstream <DataFlow URL>
 ```
 
-在提交修改前，请将仓库与上游仓库同步：
+Before submitting your changes, sync your local repository with the upstream repository:
 
 ```
 git pull upstream main
 ```
-同步后进行提交：
+
+After syncing, push your changes:
+
 ```
 git push origin main
 ```
 
-在提交修改到自己账户的fork仓库后，发起pull request。
+Once your changes are pushed to your forked repository, create a pull request.
 
-如果不熟悉git相关操作，请在进行操作时遵循如下注意事项:
+If you're not familiar with Git operations, please follow these guidelines:
 
-1. ``git add`` : 对修改过的文件/文件夹逐个执行``git add``, 不要执行 ``git add .``。如果该命令运行时间过长，**极大概率是将大文件放入了暂存区**，请立即停止并进行检查。
+1. `git add`: Add modified files/folders one by one using `git add <file>`. **Do not use** `git add .`. If this command takes too long, it’s **very likely you’ve added large files to the staging area**. Stop immediately and check.
 
-2. ``git commit`` : 提交信息尽量清晰准确。
+2. `git commit`: Make your commit messages as clear and accurate as possible.
 
-3. ``git push``:  执行该指令时，请指明push到的远程仓库及分支，如果不熟悉git相关操作，请在push之前执行``git remote -v`` 查看每个远程仓库的别名对应的URL以明确。
+3. `git push`: When pushing, specify the remote repository and branch. If you're unsure, run `git remote -v` before pushing to confirm the URL of each remote alias.
 
-4. 发起PR: 发起PR前请点击``Files changed``仔细检查是否有不应该提交的文件修改。
+4. Creating a PR: Before submitting a PR, click on `Files changed` and carefully check whether there are any files that should not be included in the submission.