feat(auto-annotation): integrate YOLO auto-labeling and enhance data management #223

o0Shark0o · 2026-01-04T11:12:59Z

✨ What’s included

This PR introduces a complete YOLO-based auto-annotation workflow and improves data management capabilities.

Auto Annotation (YOLO)

Integrates YOLO auto-labeling into the annotation task workflow
Adds a fully designed front-end interface for configuring and triggering auto-annotation
Seamlessly embeds the auto-annotation flow into the existing labeling task page

Data Management Enhancements

Adds support for creating folders in the data management module
Enables uploading files directly into specific folders
Improves dataset organization and usability for large-scale annotation tasks

🎯 Why this change

Reduces manual labeling effort by enabling automatic YOLO-based annotations
Improves annotation efficiency and task setup experience
Enhances dataset organization to better support complex annotation workflows

🧪 How to test

Create or open an annotation task
Configure and trigger YOLO auto-annotation from the task page
Verify auto-labeled results are correctly applied
Navigate to Data Management
Create a new folder and upload files into it
Confirm files are correctly organized and accessible

📝 Notes

This PR focuses on feature integration and UI flow; no breaking changes are introduced
Existing annotation and data management functionality remains backward compatible

runtime/python-executor/datamate/operator_runtime.py

+        executor_type = get_from_cfg(task_id, "executor_type")
+        if not WRAPPERS.get(executor_type).cancel(task_id):
+            raise APIException(ErrorCode.CANCEL_TASK_ERROR)
+    except Exception as e:


In general, to fix uncontrolled path usage, you should normalize the path and verify that it stays within a designated safe root directory, and/or constrain the untrusted component (here, task_id) to a safe format. Simply calling os.path.exists on an untrusted path does not provide any security.

For this code, the least intrusive fix is:

Introduce a constant base directory for flows, e.g. FLOW_BASE_DIR = "/flow".

Implement a safe path resolution helper that:

Joins the base directory with a relative path (such as task_id/process.yaml).

Normalizes the result with os.path.abspath or os.path.normpath.

Verifies that the normalized path is still inside the base directory (e.g. via os.path.commonpath).

Optionally rejects absolute task_id segments outright.

Update check_valid_path so it can optionally enforce that paths are under a given base directory. For uses that should be under /flow, call it with that base.

Use this helper for both:

The config_path computed in get_from_cfg.

The config_path in submit_task.
This removes redundant ad‑hoc string interpolation and ensures consistent validation.

We must keep existing behavior (still read /flow/{task_id}/process.yaml when task_id is safe), but add the containment check and raise the same FILE_NOT_FOUND_ERROR / log errors when the resolved path is invalid or outside the allowed base. No new imports are strictly needed; os.path.commonpath and os.path.join are already available via import os.

runtime/python-executor/datamate/operator_runtime.py

+    return success_json_info
+
+
+def check_valid_path(file_path):


To fix this, the path derived from the untrusted task_id must be validated before being used with open() (and indirectly before being used by wrappers that likely read the same config). The goal is to ensure that regardless of the task_id value, the resulting config_path always points to a file within a known safe root directory, and that there is no way to traverse out of that directory using .., absolute paths, or similar tricks.

The best way to do this without changing existing functionality is:

Define a constant safe root for flow configurations, e.g. FLOW_ROOT = "/flow".

Replace direct f-string construction of config_path with os.path.join(FLOW_ROOT, task_id, "process.yaml") followed by os.path.normpath to collapse .. segments.

Verify that the normalized config_path stays within FLOW_ROOT by checking that it starts with FLOW_ROOT + os.sep or exactly equals FLOW_ROOT, and reject otherwise (raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR) or a more specific error).

Reuse this safe construction in both submit_task (where config_path is passed to wrappers) and get_from_cfg (where config_path is opened and read).

Optionally, further restrict task_id to a safe filename pattern (e.g., alphanumerics, -, _) using a regex to reduce attack surface.

Concretely, in runtime/python-executor/datamate/operator_runtime.py:

Add a module-level constant FLOW_ROOT = "/flow" under the existing LOG_DIR definition.

Introduce a helper function, e.g. build_config_path(task_id: str) -> str, that:

Joins FLOW_ROOT, task_id, and "process.yaml".

Normalizes with os.path.normpath.

Converts to an absolute path.

Ensures the resulting path is under FLOW_ROOT using a prefix check.

Raises APIException(ErrorCode.FILE_NOT_FOUND_ERROR) if validation fails.

Update submit_task to call this helper instead of constructing config_path with an f-string.

Update get_from_cfg similarly to use the helper instead of its own f-string.

This centralizes the logic for safe path construction and ensures both call sites are protected against path traversal and similar issues.

feat(auto-annotation): initial setup

96cff2e

github-advanced-security bot found potential problems Jan 4, 2026

View reviewed changes

o0Shark0o added 4 commits January 5, 2026 10:25

Merge branch 'main' into feat/auto-annotation-new

935045a

chore: remove package-lock.json

6395ad8

chore: 清理本地测试脚本与 Maven 设置

180c3a7

chore: change package-lock.json

cbed69b

Dallas98 merged commit 3f1ad6a into main Jan 5, 2026
19 of 20 checks passed

@@ -17,6 +17,7 @@
             # 日志配置
             LOG_DIR = "/var/log/datamate/runtime"
+            FLOW_BASE_DIR = "/flow"
             os.makedirs(LOG_DIR, exist_ok=True)
             logger.add(
                 f"{LOG_DIR}/runtime.log",
@@ -82,7 +83,7 @@
             @app.post("/api/task/{task_id}/submit")
             async def submit_task(task_id):
-                config_path = f"/flow/{task_id}/process.yaml"
+                config_path = get_flow_config_path(task_id)
                 logger.info("Start submitting job...")
                 dataset_path = get_from_cfg(task_id, "dataset_path")
@@ -127,17 +128,39 @@
                 return success_json_info
-            def check_valid_path(file_path):
+            def check_valid_path(file_path, base_dir: str = None):
                 full_path = os.path.abspath(file_path)
+                if base_dir is not None:
+                    base_dir_abs = os.path.abspath(base_dir)
+                    try:
+                        common = os.path.commonpath([base_dir_abs, full_path])
+                    except ValueError:
+                        # Occurs if paths are on different drives (on Windows) or invalid
+                        return False
+                    if common != base_dir_abs:
+                        return False
                 return os.path.exists(full_path)
-            def get_from_cfg(task_id, key):
-                config_path = f"/flow/{task_id}/process.yaml"
-                if not check_valid_path(config_path):
-                    logger.error(f"config_path is not existed! please check this path.")
+            def get_flow_config_path(task_id: str) -> str:
+                """
+                Build a safe absolute path to the flow configuration file for a task.
+                Ensures the resulting path stays within FLOW_BASE_DIR.
+                """
+                # Disallow absolute task_id to avoid bypassing the base directory via join
+                if os.path.isabs(task_id):
                     raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
+                relative_path = os.path.join(task_id, "process.yaml")
+                full_path = os.path.abspath(os.path.join(FLOW_BASE_DIR, relative_path))
+                if not check_valid_path(full_path, base_dir=FLOW_BASE_DIR):
+                    logger.error(f"config_path is not existed or invalid! please check this path.")
+                    raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
+                return full_path
+            def get_from_cfg(task_id, key):
+                config_path = get_flow_config_path(task_id)
                 with open(config_path, "r", encoding='utf-8') as f:
                     content = f.read()
                     cfg = yaml.safe_load(content)

@@ -17,6 +17,7 @@
             # 日志配置
             LOG_DIR = "/var/log/datamate/runtime"
+            FLOW_ROOT = "/flow"
             os.makedirs(LOG_DIR, exist_ok=True)
             logger.add(
                 f"{LOG_DIR}/runtime.log",
@@ -82,7 +83,7 @@
             @app.post("/api/task/{task_id}/submit")
             async def submit_task(task_id):
-                config_path = f"/flow/{task_id}/process.yaml"
+                config_path = build_config_path(task_id)
                 logger.info("Start submitting job...")
                 dataset_path = get_from_cfg(task_id, "dataset_path")
@@ -132,8 +133,22 @@
                 return os.path.exists(full_path)
+            def build_config_path(task_id: str) -> str:
+                # Build a normalized, absolute config path under FLOW_ROOT and validate it.
+                raw_path = os.path.join(FLOW_ROOT, task_id, "process.yaml")
+                normalized_path = os.path.abspath(os.path.normpath(raw_path))
+                # Ensure the resulting path stays within the FLOW_ROOT directory.
+                flow_root_abs = os.path.abspath(FLOW_ROOT)
+                if not (normalized_path == flow_root_abs or normalized_path.startswith(flow_root_abs + os.sep)):
+                    logger.error(f"Invalid config path derived from task_id '{task_id}'.")
+                    raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
+                return normalized_path
             def get_from_cfg(task_id, key):
-                config_path = f"/flow/{task_id}/process.yaml"
+                config_path = build_config_path(task_id)
                 if not check_valid_path(config_path):
                     logger.error(f"config_path is not existed! please check this path.")
                     raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(auto-annotation): integrate YOLO auto-labeling and enhance data management #223

feat(auto-annotation): integrate YOLO auto-labeling and enhance data management #223

Uh oh!

o0Shark0o commented Jan 4, 2026

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(auto-annotation): integrate YOLO auto-labeling and enhance data management #223

feat(auto-annotation): integrate YOLO auto-labeling and enhance data management #223

Uh oh!

Conversation

o0Shark0o commented Jan 4, 2026

✨ What’s included

Auto Annotation (YOLO)

Data Management Enhancements

🎯 Why this change

🧪 How to test

📝 Notes

Uh oh!

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants