Skip to content

Conversation

@o0Shark0o
Copy link
Collaborator

✨ What’s included

This PR introduces a complete YOLO-based auto-annotation workflow and improves data management capabilities.

Auto Annotation (YOLO)

  • Integrates YOLO auto-labeling into the annotation task workflow
  • Adds a fully designed front-end interface for configuring and triggering auto-annotation
  • Seamlessly embeds the auto-annotation flow into the existing labeling task page

Data Management Enhancements

  • Adds support for creating folders in the data management module
  • Enables uploading files directly into specific folders
  • Improves dataset organization and usability for large-scale annotation tasks

🎯 Why this change

  • Reduces manual labeling effort by enabling automatic YOLO-based annotations
  • Improves annotation efficiency and task setup experience
  • Enhances dataset organization to better support complex annotation workflows

🧪 How to test

  1. Create or open an annotation task
  2. Configure and trigger YOLO auto-annotation from the task page
  3. Verify auto-labeled results are correctly applied
  4. Navigate to Data Management
  5. Create a new folder and upload files into it
  6. Confirm files are correctly organized and accessible

📝 Notes

  • This PR focuses on feature integration and UI flow; no breaking changes are introduced
  • Existing annotation and data management functionality remains backward compatible

executor_type = get_from_cfg(task_id, "executor_type")
if not WRAPPERS.get(executor_type).cancel(task_id):
raise APIException(ErrorCode.CANCEL_TASK_ERROR)
except Exception as e:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

In general, to fix uncontrolled path usage, you should normalize the path and verify that it stays within a designated safe root directory, and/or constrain the untrusted component (here, task_id) to a safe format. Simply calling os.path.exists on an untrusted path does not provide any security.

For this code, the least intrusive fix is:

  1. Introduce a constant base directory for flows, e.g. FLOW_BASE_DIR = "/flow".
  2. Implement a safe path resolution helper that:
    • Joins the base directory with a relative path (such as task_id/process.yaml).
    • Normalizes the result with os.path.abspath or os.path.normpath.
    • Verifies that the normalized path is still inside the base directory (e.g. via os.path.commonpath).
    • Optionally rejects absolute task_id segments outright.
  3. Update check_valid_path so it can optionally enforce that paths are under a given base directory. For uses that should be under /flow, call it with that base.
  4. Use this helper for both:
    • The config_path computed in get_from_cfg.
    • The config_path in submit_task.
      This removes redundant ad‑hoc string interpolation and ensures consistent validation.

We must keep existing behavior (still read /flow/{task_id}/process.yaml when task_id is safe), but add the containment check and raise the same FILE_NOT_FOUND_ERROR / log errors when the resolved path is invalid or outside the allowed base. No new imports are strictly needed; os.path.commonpath and os.path.join are already available via import os.

Suggested changeset 1
runtime/python-executor/datamate/operator_runtime.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/runtime/python-executor/datamate/operator_runtime.py b/runtime/python-executor/datamate/operator_runtime.py
--- a/runtime/python-executor/datamate/operator_runtime.py
+++ b/runtime/python-executor/datamate/operator_runtime.py
@@ -17,6 +17,7 @@
 
 # 日志配置
 LOG_DIR = "/var/log/datamate/runtime"
+FLOW_BASE_DIR = "/flow"
 os.makedirs(LOG_DIR, exist_ok=True)
 logger.add(
     f"{LOG_DIR}/runtime.log",
@@ -82,7 +83,7 @@
 
 @app.post("/api/task/{task_id}/submit")
 async def submit_task(task_id):
-    config_path = f"/flow/{task_id}/process.yaml"
+    config_path = get_flow_config_path(task_id)
     logger.info("Start submitting job...")
 
     dataset_path = get_from_cfg(task_id, "dataset_path")
@@ -127,17 +128,39 @@
     return success_json_info
 
 
-def check_valid_path(file_path):
+def check_valid_path(file_path, base_dir: str = None):
     full_path = os.path.abspath(file_path)
+    if base_dir is not None:
+        base_dir_abs = os.path.abspath(base_dir)
+        try:
+            common = os.path.commonpath([base_dir_abs, full_path])
+        except ValueError:
+            # Occurs if paths are on different drives (on Windows) or invalid
+            return False
+        if common != base_dir_abs:
+            return False
     return os.path.exists(full_path)
 
 
-def get_from_cfg(task_id, key):
-    config_path = f"/flow/{task_id}/process.yaml"
-    if not check_valid_path(config_path):
-        logger.error(f"config_path is not existed! please check this path.")
+def get_flow_config_path(task_id: str) -> str:
+    """
+    Build a safe absolute path to the flow configuration file for a task.
+    Ensures the resulting path stays within FLOW_BASE_DIR.
+    """
+    # Disallow absolute task_id to avoid bypassing the base directory via join
+    if os.path.isabs(task_id):
         raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
+    relative_path = os.path.join(task_id, "process.yaml")
+    full_path = os.path.abspath(os.path.join(FLOW_BASE_DIR, relative_path))
+    if not check_valid_path(full_path, base_dir=FLOW_BASE_DIR):
+        logger.error(f"config_path is not existed or invalid! please check this path.")
+        raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
+    return full_path
 
+
+def get_from_cfg(task_id, key):
+    config_path = get_flow_config_path(task_id)
+
     with open(config_path, "r", encoding='utf-8') as f:
         content = f.read()
         cfg = yaml.safe_load(content)
EOF
@@ -17,6 +17,7 @@

# 日志配置
LOG_DIR = "/var/log/datamate/runtime"
FLOW_BASE_DIR = "/flow"
os.makedirs(LOG_DIR, exist_ok=True)
logger.add(
f"{LOG_DIR}/runtime.log",
@@ -82,7 +83,7 @@

@app.post("/api/task/{task_id}/submit")
async def submit_task(task_id):
config_path = f"/flow/{task_id}/process.yaml"
config_path = get_flow_config_path(task_id)
logger.info("Start submitting job...")

dataset_path = get_from_cfg(task_id, "dataset_path")
@@ -127,17 +128,39 @@
return success_json_info


def check_valid_path(file_path):
def check_valid_path(file_path, base_dir: str = None):
full_path = os.path.abspath(file_path)
if base_dir is not None:
base_dir_abs = os.path.abspath(base_dir)
try:
common = os.path.commonpath([base_dir_abs, full_path])
except ValueError:
# Occurs if paths are on different drives (on Windows) or invalid
return False
if common != base_dir_abs:
return False
return os.path.exists(full_path)


def get_from_cfg(task_id, key):
config_path = f"/flow/{task_id}/process.yaml"
if not check_valid_path(config_path):
logger.error(f"config_path is not existed! please check this path.")
def get_flow_config_path(task_id: str) -> str:
"""
Build a safe absolute path to the flow configuration file for a task.
Ensures the resulting path stays within FLOW_BASE_DIR.
"""
# Disallow absolute task_id to avoid bypassing the base directory via join
if os.path.isabs(task_id):
raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
relative_path = os.path.join(task_id, "process.yaml")
full_path = os.path.abspath(os.path.join(FLOW_BASE_DIR, relative_path))
if not check_valid_path(full_path, base_dir=FLOW_BASE_DIR):
logger.error(f"config_path is not existed or invalid! please check this path.")
raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
return full_path


def get_from_cfg(task_id, key):
config_path = get_flow_config_path(task_id)

with open(config_path, "r", encoding='utf-8') as f:
content = f.read()
cfg = yaml.safe_load(content)
Copilot is powered by AI and may make mistakes. Always verify output.
return success_json_info


def check_valid_path(file_path):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

To fix this, the path derived from the untrusted task_id must be validated before being used with open() (and indirectly before being used by wrappers that likely read the same config). The goal is to ensure that regardless of the task_id value, the resulting config_path always points to a file within a known safe root directory, and that there is no way to traverse out of that directory using .., absolute paths, or similar tricks.

The best way to do this without changing existing functionality is:

  1. Define a constant safe root for flow configurations, e.g. FLOW_ROOT = "/flow".
  2. Replace direct f-string construction of config_path with os.path.join(FLOW_ROOT, task_id, "process.yaml") followed by os.path.normpath to collapse .. segments.
  3. Verify that the normalized config_path stays within FLOW_ROOT by checking that it starts with FLOW_ROOT + os.sep or exactly equals FLOW_ROOT, and reject otherwise (raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR) or a more specific error).
  4. Reuse this safe construction in both submit_task (where config_path is passed to wrappers) and get_from_cfg (where config_path is opened and read).
  5. Optionally, further restrict task_id to a safe filename pattern (e.g., alphanumerics, -, _) using a regex to reduce attack surface.

Concretely, in runtime/python-executor/datamate/operator_runtime.py:

  • Add a module-level constant FLOW_ROOT = "/flow" under the existing LOG_DIR definition.
  • Introduce a helper function, e.g. build_config_path(task_id: str) -> str, that:
    • Joins FLOW_ROOT, task_id, and "process.yaml".
    • Normalizes with os.path.normpath.
    • Converts to an absolute path.
    • Ensures the resulting path is under FLOW_ROOT using a prefix check.
    • Raises APIException(ErrorCode.FILE_NOT_FOUND_ERROR) if validation fails.
  • Update submit_task to call this helper instead of constructing config_path with an f-string.
  • Update get_from_cfg similarly to use the helper instead of its own f-string.

This centralizes the logic for safe path construction and ensures both call sites are protected against path traversal and similar issues.

Suggested changeset 1
runtime/python-executor/datamate/operator_runtime.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/runtime/python-executor/datamate/operator_runtime.py b/runtime/python-executor/datamate/operator_runtime.py
--- a/runtime/python-executor/datamate/operator_runtime.py
+++ b/runtime/python-executor/datamate/operator_runtime.py
@@ -17,6 +17,7 @@
 
 # 日志配置
 LOG_DIR = "/var/log/datamate/runtime"
+FLOW_ROOT = "/flow"
 os.makedirs(LOG_DIR, exist_ok=True)
 logger.add(
     f"{LOG_DIR}/runtime.log",
@@ -82,7 +83,7 @@
 
 @app.post("/api/task/{task_id}/submit")
 async def submit_task(task_id):
-    config_path = f"/flow/{task_id}/process.yaml"
+    config_path = build_config_path(task_id)
     logger.info("Start submitting job...")
 
     dataset_path = get_from_cfg(task_id, "dataset_path")
@@ -132,8 +133,22 @@
     return os.path.exists(full_path)
 
 
+def build_config_path(task_id: str) -> str:
+    # Build a normalized, absolute config path under FLOW_ROOT and validate it.
+    raw_path = os.path.join(FLOW_ROOT, task_id, "process.yaml")
+    normalized_path = os.path.abspath(os.path.normpath(raw_path))
+
+    # Ensure the resulting path stays within the FLOW_ROOT directory.
+    flow_root_abs = os.path.abspath(FLOW_ROOT)
+    if not (normalized_path == flow_root_abs or normalized_path.startswith(flow_root_abs + os.sep)):
+        logger.error(f"Invalid config path derived from task_id '{task_id}'.")
+        raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
+
+    return normalized_path
+
+
 def get_from_cfg(task_id, key):
-    config_path = f"/flow/{task_id}/process.yaml"
+    config_path = build_config_path(task_id)
     if not check_valid_path(config_path):
         logger.error(f"config_path is not existed! please check this path.")
         raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
EOF
@@ -17,6 +17,7 @@

# 日志配置
LOG_DIR = "/var/log/datamate/runtime"
FLOW_ROOT = "/flow"
os.makedirs(LOG_DIR, exist_ok=True)
logger.add(
f"{LOG_DIR}/runtime.log",
@@ -82,7 +83,7 @@

@app.post("/api/task/{task_id}/submit")
async def submit_task(task_id):
config_path = f"/flow/{task_id}/process.yaml"
config_path = build_config_path(task_id)
logger.info("Start submitting job...")

dataset_path = get_from_cfg(task_id, "dataset_path")
@@ -132,8 +133,22 @@
return os.path.exists(full_path)


def build_config_path(task_id: str) -> str:
# Build a normalized, absolute config path under FLOW_ROOT and validate it.
raw_path = os.path.join(FLOW_ROOT, task_id, "process.yaml")
normalized_path = os.path.abspath(os.path.normpath(raw_path))

# Ensure the resulting path stays within the FLOW_ROOT directory.
flow_root_abs = os.path.abspath(FLOW_ROOT)
if not (normalized_path == flow_root_abs or normalized_path.startswith(flow_root_abs + os.sep)):
logger.error(f"Invalid config path derived from task_id '{task_id}'.")
raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)

return normalized_path


def get_from_cfg(task_id, key):
config_path = f"/flow/{task_id}/process.yaml"
config_path = build_config_path(task_id)
if not check_valid_path(config_path):
logger.error(f"config_path is not existed! please check this path.")
raise APIException(ErrorCode.FILE_NOT_FOUND_ERROR)
Copilot is powered by AI and may make mistakes. Always verify output.
@Dallas98 Dallas98 merged commit 3f1ad6a into main Jan 5, 2026
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants