Skip to content

fix: 修复 file_path 始终为 null 的问题#217

Open
Windelly wants to merge 6 commits intolintsinghua:v3.0.0from
Windelly:fix/filepath-null
Open

fix: 修复 file_path 始终为 null 的问题#217
Windelly wants to merge 6 commits intolintsinghua:v3.0.0from
Windelly:fix/filepath-null

Conversation

@Windelly
Copy link
Copy Markdown

@Windelly Windelly commented May 4, 2026

根因分析

本 PR 修复了 Agent 审计任务中 findings 的 字段始终为 null 的问题,该问题导致前端无法定位漏洞文件位置。

Bug 1:三元表达式运算符优先级错误

中从 字段提取文件路径时,三元表达式缺少括号:

# 修复前(错误):Python 运算符优先级导致整个表达式被解析为条件表达式的 body
finding.get('location', '').split(':')[0] if ':' in finding.get('location', '') else finding.get('location')

# 修复后(正确):加括号明确优先级
(finding.get('location', '').split(':')[0] if ':' in finding.get('location', '') else finding.get('location', ''))

Bug 2:无 file_path 的 findings 直接入库

LLM 有时返回不含 的 findings,这些无效数据直接写入数据库。

Bug 3:orchestrator 文件扩展名白名单过严

只允许 ,漏掉 等配置文件。

Bug 4:merge 逻辑未考虑 garbage path

当 new_file 为无效值(如 ?)时,仍可能覆盖 existing 的有效路径。

修复内容

文件 改动
三元表达式加括号 + file_path 空值跳过 + verdict 字段保存
放宽扩展名校验 + garbage path 不参与 merge
AgentFinding 模型新增 列
_standardize_findings 新增 file_path 必填校验

验证

  • 本地测试通过
  • 数据库 migration 需执行(新增 verdict 列)

根因:
1. agent_tasks.py 中三元表达式缺少括号,运算符优先级导致 location 解析异常
2. LLM 返回的 findings 可能没有 file_path,直接保存导致数据库中大量 null
3. orchestrator 文件扩展名白名单过于严格,漏掉 .yaml/.json 等非传统代码文件
4. merge 逻辑未考虑 garbage path,可能将有效路径覆盖为无效值

修复内容:
- 三元表达式加括号修复优先级 bug
- _save_findings 和 analysis 中新增 file_path 必填校验,空值跳过并记录 warning
- 新增 verdict 字段保存到数据库(confirmed/likely/uncertain/false_positive)
- orchestrator 放宽文件扩展名校验,改用 endsWith('/') 排除目录
- merge 逻辑增强:garbage path 不参与 merge,避免覆盖有效路径
@vercel
Copy link
Copy Markdown

vercel Bot commented May 4, 2026

@Windelly is attempting to deploy a commit to the tsinghuaiiilove-2257's projects Team on Vercel.

A member of the Team first needs to authorize it.

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 4, 2026

Review Summary by Qodo

(Agentic_describe updated until commit e56dcfd)

Fix file_path null issue with validation and enhanced merge logic

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Fix file_path null issue by adding type-safe extraction and validation
  - Correct ternary operator precedence with parentheses
  - Skip findings without valid file_path with warning logs
  - Add fallback to location/file fields when file_path empty
• Enhance finding deduplication and merge logic
  - Prevent garbage paths from overwriting valid file paths
  - Support cross-file matching only for garbage paths or prefix matches
  - Relax orchestrator file extension validation to support config files
• Add verdict field to track finding confidence levels
  - New database column for confirmed/likely/uncertain/false_positive
  - Expose verdict in API response and use for verification status
  - Include verdict in statistics calculation
• Improve findings filtering consistency across analysis pipeline
  - Filter invalid findings before statistics calculation
  - Use filtered findings for severity and verification counts
  - Add Alembic migration for verdict column
Diagram
flowchart LR
  A["Raw Findings from LLM"] -->|Type-safe extraction| B["Extract file_path"]
  B -->|Validate & filter| C["Skip invalid findings"]
  C -->|Normalize| D["Normalized Findings"]
  D -->|Dedup logic| E["Merge with existing"]
  E -->|Prevent garbage overwrite| F["Valid Findings"]
  F -->|Save to DB| G["AgentFinding with verdict"]
  G -->|Expose in API| H["AgentFindingResponse"]
Loading

Grey Divider

File Changes

1. backend/alembic/versions/009_add_verdict_to_agent_findings.py Database migration +26/-0

Add Alembic migration for verdict column

• Create new Alembic migration to add verdict column to agent_findings table
• Add String(30) nullable verdict column with index
• Include downgrade function to drop column and index

backend/alembic/versions/009_add_verdict_to_agent_findings.py


2. backend/app/models/agent_task.py Database schema +1/-0

Add verdict column to AgentFinding model

• Add verdict column to AgentFinding model as String(30) nullable with index
• Column stores verdict values: confirmed/likely/uncertain/false_positive
• Positioned after status column in verification information section

backend/app/models/agent_task.py


3. backend/app/api/v1/endpoints/agent_tasks.py Bug fix, enhancement +49/-27

Fix file_path extraction and enhance findings filtering

• Add verdict field to AgentFindingResponse model as Optional[str]
• Fix file_path extraction with type-safe handling and proper precedence
• Skip findings without valid file_path and log warnings
• Filter findings before statistics calculation to ensure consistency
• Use filtered_findings for severity counts and verification status
• Calculate security/quality scores using filtered findings only
• Add verdict field to AgentFinding ORM model creation

backend/app/api/v1/endpoints/agent_tasks.py


View more (2)
4. backend/app/services/agent/agents/analysis.py Bug fix, error handling +30/-1

Add file_path validation and filtering in analysis agent

• Add file_path validation with fallback to file/location fields
• Skip findings without valid file_path and track skipped count
• Add type checking for location field to handle non-string values
• Log warnings for invalid findings with details
• Use extracted file_path in standardized findings output
• Report skipped findings count in completion message

backend/app/services/agent/agents/analysis.py


5. backend/app/services/agent/agents/orchestrator.py Bug fix, enhancement +25/-1

Enhance file validation and merge deduplication logic

• Relax file extension validation by replacing whitelist with endsWith("/") check
• Support config files like .yaml, .json, .xml in addition to code files
• Add garbage path detection to prevent invalid paths from overwriting valid ones
• Enhance cross-file matching logic with prefix matching support
• Add type-safe normalization check before processing findings
• Implement smart merge logic that preserves valid file paths

backend/app/services/agent/agents/orchestrator.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 4, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

Grey Divider


Action required

1. Stats block NameError🐞 Bug ≡ Correctness
Description
任务完成后的统计逻辑里使用了 files_with_findings_set 但未定义,会在任务即将完成时抛出 NameError 并导致任务标记失败。同时 filtered_findings
从未被填充,导致严重度/verified/security_score 统计全部基于空列表计算。
Code

backend/app/api/v1/endpoints/agent_tasks.py[R570-604]

+                # 🔥 FIX: 先过滤 findings,再用过滤后的列表做统计
+                # 与 _save_findings 的过滤逻辑保持一致(排除无 file_path 的 finding)
+                filtered_findings = []
              for f in findings:
                  if isinstance(f, dict):
-                        file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
+                        raw_file_path = f.get("file_path") or f.get("file")
+                        file_path = raw_file_path if isinstance(raw_file_path, str) else ""
+                        if not file_path:
+                            raw_location = f.get("location", "")
+                            location = raw_location if isinstance(raw_location, str) else ""
+                            file_path = location.split(":")[0]
                      if file_path:
                          files_with_findings_set.add(file_path)
              task.files_with_findings = len(files_with_findings_set)
-                # 统计严重程度和验证状态
+                # 统计严重程度和验证状态(使用过滤后的列表)
              verified_count = 0
-                for f in findings:
-                    if isinstance(f, dict):
-                        sev = str(f.get("severity", "low")).lower()
-                        if sev == "critical":
-                            task.critical_count += 1
-                        elif sev == "high":
-                            task.high_count += 1
-                        elif sev == "medium":
-                            task.medium_count += 1
-                        elif sev == "low":
-                            task.low_count += 1
-                        # 🔥 统计已验证的发现
-                        if f.get("is_verified") or f.get("verdict") == "confirmed":
-                            verified_count += 1
+                for f in filtered_findings:
+                    sev = str(f.get("severity", "low")).lower()
+                    if sev == "critical":
+                        task.critical_count += 1
+                    elif sev == "high":
+                        task.high_count += 1
+                    elif sev == "medium":
+                        task.medium_count += 1
+                    elif sev == "low":
+                        task.low_count += 1
+                    # 🔥 统计已验证的发现
+                    if f.get("is_verified") or f.get("verdict") == "confirmed":
+                        verified_count += 1
              task.verified_count = verified_count
              
-                # 计算安全评分
-                task.security_score = _calculate_security_score(findings)
-                task.quality_score = _calculate_security_score(findings)
+                # 计算安全评分(使用过滤后的列表)
+                task.security_score = _calculate_security_score(filtered_findings)
+                task.quality_score = _calculate_security_score(filtered_findings)
Evidence
在任务完成分支中,代码声明了 filtered_findings = [],但随后的过滤循环只向 files_with_findings_set.add(file_path)
写入;该变量在当前作用域没有任何初始化。同时后续统计循环 for f in filtered_findings: 以及
_calculate_security_score(filtered_findings) 都使用了从未 append 的空列表,导致 verified_count/severity
分布/score 计算结果不正确(甚至在 NameError 前就已逻辑错误)。

backend/app/api/v1/endpoints/agent_tasks.py[570-605]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
在 `agent_tasks.py` 的任务完成统计逻辑中:
- `files_with_findings_set` 被使用但未初始化,运行时会直接 `NameError`。
- `filtered_findings` 声明后未被 `append`,导致后续严重度/verified/security_score 统计全部基于空列表。
### Issue Context
该代码位于任务完成后统计阶段,异常会让任务在“已保存 findings”后仍被标记为 FAILED/异常回滚统计字段,影响前端展示与任务状态。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[570-605]
### Suggested fix
1) 在进入循环前初始化:`files_with_findings_set = set()`。
2) 在过滤循环中,当 `file_path` 有效时同时:
- `files_with_findings_set.add(file_path)`
- `filtered_findings.append(f)`
3) (可选但更一致)将过滤条件与 `_save_findings` 保持一致(例如 `file_path.strip()` 校验),避免统计与入库数量不一致。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Stats loop type crash🐞 Bug ☼ Reliability
Description
agent_tasks 在任务完成时统计 findings 使用 fp.strip()loc.split(),若 file_path/file/location
为非字符串会抛异常,导致本应 COMPLETED 的任务在统计阶段被外层 except 捕获并标记 FAILED。Orchestrator 的 _normalize_finding
会保留非字符串的 location(仅在 isinstance(location, str) 时解析),因此该输入在真实链路中可达。
Code

backend/app/api/v1/endpoints/agent_tasks.py[R573-587]

            for f in findings:
-                    if isinstance(f, dict):
-                        file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
-                        if file_path:
-                            files_with_findings_set.add(file_path)
+                    if not isinstance(f, dict):
+                        continue
+                    fp = f.get("file_path") or f.get("file") or ""
+                    if not fp.strip() and f.get("location"):
+                        loc = f.get("location", "")
+                        fp = loc.split(":")[0] if ":" in loc else loc
+                    if fp and fp.strip():
+                        filtered_findings.append(f)
+
+                files_with_findings_set = set()
+                for f in filtered_findings:
+                    file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
+                    if file_path:
+                        files_with_findings_set.add(file_path)
Evidence
统计代码直接对 fp 调用 .strip()、对 loc 调用 .split(),但 fp/loc 并未保证为字符串;同时 Orchestrator 标准化仅在
location 为字符串时才解析成 file_path,否则会原样保留,从而可能把非字符串 location 传递到该统计逻辑,引发 AttributeError。该统计块位于任务完成大
try 内,异常会落入外层 except Exception,进而更新任务状态为 FAILED。

backend/app/api/v1/endpoints/agent_tasks.py[565-662]
backend/app/api/v1/endpoints/agent_tasks.py[570-588]
backend/app/services/agent/agents/orchestrator.py[1137-1159]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`backend/app/api/v1/endpoints/agent_tasks.py` 在完成任务统计阶段对 `file_path`/`file`/`location` 做字符串操作(`.strip()` / `.split(':')`),但这些字段可能来自 LLM/子 Agent 输出,类型不一定是 `str`。当出现 `dict/list` 等类型时会触发 `AttributeError`,从而让任务在“统计”环节失败并被标记为 FAILED。
### Issue Context
Orchestrator 的 `_normalize_finding` 只在 `location` 为字符串时才解析,否则会保留原值;因此非字符串 `location` 可以到达 agent_tasks 的统计逻辑。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[570-588]
### What to change
1. 提取一个小函数(或内联)做安全的 file_path 解析:
- `fp = f.get('file_path')` / `f.get('file')` 只有在 `isinstance(x, str)` 时才使用,否则当作空字符串。
- `location` 仅在 `isinstance(loc, str)` 时才做 `':' in loc` / `loc.split(':', 1)`。
2. 第二个循环不要再次对 `f.get('location', '').split(...)` 做无类型保护的 split;直接使用前面解析出来的 `fp`(必要时将 `fp` 写回 `f['file_path']` 或在 `filtered_findings` 中存 tuple)。
3. 单元/集成层面:构造 `finding={'location': {'file':'a.py'}}` / `{'file_path': {'x':1}}` 确保不会导致任务失败。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Normalized None causes crash🐞 Bug ☼ Reliability
Description
Orchestrator 在合并 findings 时先调用 _normalize_finding(new_f),但不检查其返回值就对 normalized_new 调用 .get()。由于本 PR
放宽 high_risk_areas 的“像文件路径”判断,更容易把不存在的路径写入 file_path,触发 _normalize_finding 返回 None 并导致任务直接崩溃。
Code

backend/app/services/agent/agents/orchestrator.py[R881-886]

+                                # 🔥 FIX: 放宽文件路径校验 - 不再限制扩展名,只要像文件路径就提取
                          if ("." in potential_file and
                              " " not in potential_file and
                              len(potential_file) < 100 and
-                                    any(potential_file.endswith(ext) for ext in ['.py', '.js', '.ts', '.java', '.go', '.php', '.rb', '.c', '.cpp', '.h'])):
+                                    not potential_file.endswith("/")):
                              file_path = potential_file
Evidence
本 PR 放宽了 high_risk_areas 的文件路径提取条件(不再限制扩展名),会把更多字符串写入 file_path;而 _normalize_finding 在 file_path
不存在时会 return None。合并逻辑未对 None 做保护,随后的 normalized_new.get(...) 会抛 AttributeError 终止整个 orchestrator。

backend/app/services/agent/agents/orchestrator.py[876-886]
backend/app/services/agent/agents/orchestrator.py[932-941]
backend/app/services/agent/agents/orchestrator.py[1223-1231]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Orchestrator merge loop assumes `_normalize_finding()` always returns a dict, but `_normalize_finding()` can return `None` when file_path validation fails. The merge loop then calls `.get()` on `None`, crashing the entire task.
## Issue Context
This PR relaxes the file-path heuristic for `high_risk_areas`, increasing the chance of extracting a non-existent path and hitting the `return None` branch.
## Fix Focus Areas
- backend/app/services/agent/agents/orchestrator.py[932-941]
- backend/app/services/agent/agents/orchestrator.py[1223-1231]
- backend/app/services/agent/agents/orchestrator.py[876-886]
## Suggested change
- After `normalized_new = self._normalize_finding(new_f)`, add:
- `if not normalized_new: continue`
- Optionally: validate existence before setting `file_path` in the `high_risk_areas` conversion path, or set `file_path` only after `_validate_file_path()` passes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. Stats use unsaved findings🐞 Bug ≡ Correctness
Description
_save_findings 现在会直接跳过无 file_path 的 finding,但任务完成时的严重程度计数、verified_count 以及 security_score 仍基于原始
findings 列表计算。这样会出现 findings_count(已保存数量)与各统计项口径不一致,导致前端/报告展示错误。
Code

backend/app/api/v1/endpoints/agent_tasks.py[R1271-1277]

+            # 🔥 v2.2: file_path 为空直接跳过
+            if not file_path or not file_path.strip():
+                logger.warning(
+                    f"[SaveFindings] 🚫 跳过无 file_path 的 finding: "
+                    f"title={finding.get('title', 'N/A')[:50]}, type={finding.get('vulnerability_type', '?')}"
+                )
+                continue
Evidence
_save_findings 新增“无 file_path 直接 continue”过滤后,saved_count 会减少并写入 task.findings_count;但同一执行流程后续仍用未过滤的
findings 计算 files_with_findings、severity 分布、verified_count 和 security_score,导致统计包含未入库的数据。

backend/app/api/v1/endpoints/agent_tasks.py[1271-1293]
backend/app/api/v1/endpoints/agent_tasks.py[522-547]
backend/app/api/v1/endpoints/agent_tasks.py[565-599]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Task statistics are computed from the original `findings` list even though `_save_findings()` filters/skips some findings (now including empty `file_path`). This makes task counters and scores inconsistent with what is actually stored in DB.
## Issue Context
`task.findings_count` is set from `saved_count`, but other counters (severity buckets, verified_count, security_score, files_with_findings) are still derived from the unfiltered `findings`.
## Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[522-547]
- backend/app/api/v1/endpoints/agent_tasks.py[565-599]
- backend/app/api/v1/endpoints/agent_tasks.py[1157-1426]
## Suggested change
Choose one:
1) Filter once, then use the filtered list for both saving and stats.
- Extract a helper like `_filter_valid_findings(findings, project_root)` that mirrors `_save_findings` filtering rules.
2) After commit, query DB for `AgentFinding` rows for the task and compute stats from DB rows (authoritative).
Ensure `files_with_findings`, severity counts, verified_count, and security_score are based on the same set as `saved_count`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Verdict migration missing 🐞 Bug ≡ Correctness
Description
AgentFinding 新增了 verdict 字段且保存时会写入该列,但 Alembic 迁移里 agent_findings 表定义不包含
verdict,运行时插入会失败并触发回滚,导致任务可能“完成”但 findings 实际未入库。
Code

backend/app/api/v1/endpoints/agent_tasks.py[R1390-1394]

       code_snippet=code_snippet[:10000] if code_snippet else None,
       suggestion=suggestion[:5000] if suggestion else None,
       is_verified=is_verified,
+                verdict=verdict,  # 🔥 新增:保存 verdict 到数据库
       ai_confidence=confidence,  # 🔥 FIX: Use ai_confidence, not confidence
Evidence
当前代码在创建 AgentFinding ORM 实例时传入 verdict;但现有创建 agent_findings 的 Alembic 迁移不含 verdict 列(且 migrations
中仅该文件创建该表),因此数据库未升级时 INSERT/COMMIT 会报“column verdict does not exist”并回滚。

backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
backend/app/models/agent_task.py[355-363]
backend/alembic/versions/006_add_agent_tables.py[146-183]
backend/alembic/versions/006_add_agent_tables.py[225-232]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`AgentFinding.verdict` 已写入 ORM 并在保存 findings 时赋值,但数据库迁移未添加该列,导致写库失败并回滚。
### Issue Context
- 现有 `agent_findings` 表由 Alembic revision `006_add_agent_tables` 创建,未包含 `verdict`。
- `_save_findings` 构建 `AgentFinding(..., verdict=verdict)` 会在 DB schema 未升级时触发插入失败。
### Fix Focus Areas
- backend/alembic/versions/006_add_agent_tables.py[146-232]
- backend/app/models/agent_task.py[355-363]
- backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
### Suggested fix
1. 新增一个 Alembic revision:
- `op.add_column('agent_findings', sa.Column('verdict', sa.String(length=30), nullable=True))`
- `op.create_index('ix_agent_findings_verdict', 'agent_findings', ['verdict'])`(如需要)
2. downgrade 中对称 `drop_index/drop_column`。
3.(可选)若写库失败不应“静默成功”,考虑在 `_save_findings` commit 失败时向上抛错或将 `saved_count` 置 0,避免任务统计误导。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Location-only findings dropped🐞 Bug ≡ Correctness
Description
AnalysisAgent 标准化阶段强制要求 finding 必须包含非空 file_path,导致仅提供 location/file 的 findings 被直接丢弃,即使系统其他模块已支持从
location/file 推导 file_path。
Code

backend/app/services/agent/agents/analysis.py[R769-777]

+                # 🔥 v2.2: file_path 必填校验 - 没有 file_path 的 finding 直接拒绝
+                file_path = finding.get("file_path", "") or ""
+                if not file_path.strip():
+                    skipped_no_filepath += 1
+                    logger.warning(
+                        f"[Analysis] 🚫 跳过无 file_path 的 finding: "
+                        f"title={finding.get('title', '?')[:50]}, type={finding.get('vulnerability_type', '?')}"
+                    )
+                    continue
Evidence
AnalysisAgent 只检查 finding['file_path'],缺失则 continue;但 Kunlun 工具输出使用 location 字段而非
file_path,Orchestrator 也实现了 location/file -> file_path 的标准化逻辑,且 _save_findings 同样支持从
location/file 回退生成 file_path。当前 AnalysisAgent 的硬校验会让这类结果在进入后续链路前就被丢弃。

backend/app/services/agent/agents/analysis.py[759-800]
backend/app/services/agent/tools/kunlun_tool.py[411-433]
backend/app/services/agent/agents/orchestrator.py[1135-1161]
backend/app/api/v1/endpoints/agent_tasks.py[1263-1278]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
AnalysisAgent 在标准化 findings 时仅接受非空 `file_path`,会丢弃仅提供 `location`/`file` 的 findings;但仓库内多个组件/工具确实会产出 `location` 字段。
### Issue Context
- Kunlun 工具解析输出时产出 `location`。
- Orchestrator 已实现 `location/file -> file_path` 的标准化。
- `_save_findings` 也支持从 `file`/`location` 回退提取 `file_path`。
### Fix Focus Areas
- backend/app/services/agent/agents/analysis.py[759-792]
- backend/app/services/agent/agents/orchestrator.py[1135-1161]
- backend/app/services/agent/tools/kunlun_tool.py[411-433]
### Suggested fix
在 AnalysisAgent 标准化时,对输入 finding 做与 Orchestrator 类似的兼容:
1. `file_path = finding.get('file_path') or finding.get('file') or parse_location(finding.get('location'))`
2. 仅在上述推导后仍为空时才跳过。
3. 同时将推导后的 `file_path` 写入 standardized finding,保证后续 Verification/Save 都能使用。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

7. 垃圾路径仍可覆盖 🐞 Bug ≡ Correctness ⭐ New
Description
Orchestrator 合并逻辑仍会把非空的 garbage file_path(如 "?") 当作“有意义值”写回 merged,从而覆盖 existing 的真实路径;在
project_root 缺失导致无法校验路径时,这个路径会通过 normalize/merge 链路。该问题会再次造成前端定位文件错误。
Code

backend/app/services/agent/agents/orchestrator.py[R971-992]

+                            elif same_type and same_line and not same_file:
+                                # 🔥 FIX: Only allow cross-file matching when:
+                                # 1. new_file is garbage ("?"/empty) and descriptions still match
+                                # 2. One path is a prefix of the other ("src/foo.py" vs "foo.py")
+                                # Do NOT merge when existing_file is garbage - that would lose the real path.
+                                new_is_garbage = not new_file or new_file == "?"
+                                prefix_match = (
+                                    new_file.endswith("/" + existing_file) or
+                                    existing_file.endswith("/" + new_file)
+                                )
+                                if (new_is_garbage and similar_desc) or prefix_match:
+                                    match_found = True
+                                    logger.info(f"[Orchestrator] Matched by type+line despite file mismatch: {new_file} vs {existing_file}")
+                                else:
+                                    match_found = False
+                            else:
+                                match_found = False
+
+                            if match_found:
                                # Update existing with new info (e.g. verification results)
                                # 🔥 FIX: Smart merge - don't overwrite good data with empty values
                                merged = dict(existing_f)  # Start with existing data
Evidence
匹配分支显式将 new_file=="?" 视为 garbage 并允许在特定条件下 match_found=True;但后续 merge 对所有非空字符串都覆盖写入,未把 "?"
当作垃圾值排除。与此同时,当 runtime_context 中 project_root 为空时,_validate_file_path 会直接返回 True(无法验证时放行),使 "?"
这类路径不会被 normalize 阶段过滤,进而进入 merge 覆盖逻辑。

backend/app/services/agent/agents/orchestrator.py[971-996]
backend/app/services/agent/agents/orchestrator.py[1118-1121]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Even with the new matching criteria, the merge step still overwrites `file_path` with any non-empty value. Garbage values like `"?"` are non-empty and can replace a previously correct path.

### Issue Context
- `new_is_garbage = ... or new_file == "?"` exists, but is not used to protect the merge assignment.
- If `project_root` is missing, `_validate_file_path()` returns True, so garbage paths can pass normalization and reach the merge.

### Fix Focus Areas
- backend/app/services/agent/agents/orchestrator.py[971-1013]
- backend/app/services/agent/agents/orchestrator.py[1104-1121]

### Suggested change
- In the merge loop, treat `file_path` values in `{ "?", "", None }` as non-meaningful (do not overwrite an existing non-garbage `file_path`).
- Optionally also enforce the comment: *do not merge when `existing_file` is garbage* by adding an explicit guard.
- Consider normalizing `file_path == "?"` to empty string earlier (normalize stage) to reduce downstream risk.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. Over-broad cross-file merge🐞 Bug ≡ Correctness
Description
Orchestrator 去重合并在 new_file 为空或 "?" 时,仅凭 same_type && same_line 就允许跨文件匹配并合并,这可能把不同文件但行号/类型相同的
finding 错误合并,导致验证结果或描述被串写。该行为改变了原本以同文件为主的匹配策略。
Code

backend/app/services/agent/agents/orchestrator.py[R971-987]

+                            elif same_type and same_line and not same_file:
+                                # 🔥 FIX: Only allow cross-file matching when:
+                                # 1. new_file is garbage ("?"/empty) - verification returned bad path
+                                # 2. One path is a prefix of the other ("src/foo.py" vs "foo.py")
+                                # Do NOT merge when existing_file is garbage - that would lose the real path.
+                                new_is_garbage = not new_file or new_file == "?"
+                                prefix_match = (
+                                    new_file.endswith("/" + existing_file) or
+                                    existing_file.endswith("/" + new_file)
+                                )
+                                if new_is_garbage or prefix_match:
+                                    match_found = True
+                                    logger.info(f"[Orchestrator] Matched by type+line despite file mismatch: {new_file} vs {existing_file}")
+                                else:
+                                    match_found = False
+                            else:
+                                match_found = False
Evidence
新逻辑在 same_type and same_line and not same_file 分支中,当 new_file 为空或 "?" 时直接 `match_found =
True,未要求 similar_desc` 或任何路径关系;因此如果某个 Agent/验证步骤产出 garbage path(空/"?"),就可能与任意文件中同 type+line 的
finding 合并。

backend/app/services/agent/agents/orchestrator.py[968-1017]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
跨文件合并在 `new_file` 为空/"?" 时过于宽松,可能错误合并无关 findings。
### Issue Context
此逻辑用于避免 verification 返回垃圾路径覆盖有效路径,但当前实现会在不同文件间仅凭 type+line 合并。
### Fix Focus Areas
- backend/app/services/agent/agents/orchestrator.py[971-987]
### What to change
1. 当 `new_is_garbage` 时,增加额外约束后才允许合并,例如:
- 要求 `similar_desc` 为真;或
- 要求 existing_file 非空且 new finding 的其它字段能证明同一处(例如同一 fingerprint / 同一 title 高相似度)。
2. 保留 `prefix_match` 的路径前缀合并逻辑(该逻辑更可解释)。
3. 确保合并时永不使用空/"?" 覆盖 existing 的有效 `file_path`(当前 smart merge 基本满足,但建议加显式保护)。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Non-string location breaks strip🐞 Bug ☼ Reliability
Description
Analysis Agent 标准化时对 file_path/location 直接调用 .strip()/.split(),未校验字段类型。只要 LLM/上游工具返回非字符串的
location/file_path(如 dict/list),就会触发 AttributeError 并导致整个 Analysis Agent 失败。
Code

backend/app/services/agent/agents/analysis.py[R769-776]

+                # 🔥 v2.2: file_path 必填校验 - 没有 file_path 的 finding 直接拒绝
+                # 优先从 file_path 获取,fallback 到 file / location
+                file_path = finding.get("file_path") or finding.get("file") or ""
+                if not file_path.strip() and finding.get("location"):
+                    loc = finding.get("location", "")
+                    file_path = loc.split(":")[0] if ":" in loc else loc
+                if not file_path.strip():
+                    skipped_no_filepath += 1
Evidence
代码仅校验 finding 是 dict,但未保证各字段类型;随后对 file_path 调用 .strip(),并可能将 location 赋给 file_path(非字符串时进一步导致
.strip() 崩溃)。该异常不在 per-finding 级别捕获,会中断整个 run。

backend/app/services/agent/agents/analysis.py[763-781]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`AnalysisAgent` assumes `file_path`/`location` are strings and calls `.strip()` / `.split()` on them. Non-string values can crash the entire agent run.
## Issue Context
Findings are only validated as dicts; individual fields are not type-validated.
## Fix Focus Areas
- backend/app/services/agent/agents/analysis.py[769-781]
## Suggested change
- Normalize types defensively:
- `raw_fp = finding.get("file_path") or finding.get("file")`
- `file_path = raw_fp if isinstance(raw_fp, str) else ""`
- `loc = finding.get("location")`
- `if not file_path and isinstance(loc, str): file_path = loc.split(":", 1)[0]`
- Keep the skip logic, but ensure it cannot throw.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
10. Verdict index not migrated 🐞 Bug ➹ Performance
Description
模型把 AgentFinding.verdict 声明为 index=True,但本次 Alembic migration 只 add_column,没有创建索引。随着 agent_findings
增长,按 verdict 过滤/统计会产生不必要的全表扫描。
Code

backend/alembic/versions/009_add_verdict_to_agent_findings.py[R19-21]

+def upgrade() -> None:
+    op.add_column('agent_findings', sa.Column('verdict', sa.String(length=30), nullable=True))
+
Evidence
SQLAlchemy model 明确要求 verdict 建索引,但 migration 未包含 op.create_index,对应索引不会在已有环境中自动出现,造成 schema 与 ORM
意图不一致。

backend/app/models/agent_task.py[355-360]
backend/alembic/versions/009_add_verdict_to_agent_findings.py[19-24]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The ORM declares an index on `agent_findings.verdict`, but the Alembic migration only adds the column.
## Issue Context
Index=True in SQLAlchemy does not automatically create indexes in existing DBs; migrations must create them.
## Fix Focus Areas
- backend/alembic/versions/009_add_verdict_to_agent_findings.py[19-24]
- backend/app/models/agent_task.py[355-360]
## Suggested change
- In `upgrade()`, add `op.create_index('ix_agent_findings_verdict', 'agent_findings', ['verdict'])`.
- In `downgrade()`, drop the index before dropping the column.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


11. Verdict not in response🐞 Bug ⚙ Maintainability
Description
已将 verdict 写入 AgentFinding,但 GET /{task_id}/findings 的响应模型未包含 verdict 字段,导致客户端无法读取该信息。
Code

backend/app/models/agent_task.py[358]

+    verdict = Column(String(30), nullable=True, index=True)  # confirmed / likely / uncertain / false_positive
Evidence
保存逻辑会持久化 verdict 到 AgentFinding;但 AgentFindingResponse 的字段列表不含 verdict,Pydantic from_attributes
序列化时也不会输出该字段。

backend/app/api/v1/endpoints/agent_tasks.py[178-205]
backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
backend/app/models/agent_task.py[355-360]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
verdict 已持久化到 `AgentFinding`,但对外查询 findings 的响应 schema 未返回该字段,导致新字段难以被前端/调用方使用。
### Issue Context
API 使用 `AgentFindingResponse`(Pydantic, `from_attributes=True`)作为 GET findings 的响应模型,当前未声明 verdict。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[178-205]
- backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
### Suggested fix
1. 在 `AgentFindingResponse` 中增加:`verdict: Optional[str] = None`。
2. 如前端需要基于 verdict 展示/过滤,可同步更新相关接口文档/前端字段映射。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit e56dcfd

Results up to commit N/A


🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)


Action required
1. Stats block NameError🐞 Bug ≡ Correctness
Description
任务完成后的统计逻辑里使用了 files_with_findings_set 但未定义,会在任务即将完成时抛出 NameError 并导致任务标记失败。同时 filtered_findings
从未被填充,导致严重度/verified/security_score 统计全部基于空列表计算。
Code

backend/app/api/v1/endpoints/agent_tasks.py[R570-604]

+                # 🔥 FIX: 先过滤 findings,再用过滤后的列表做统计
+                # 与 _save_findings 的过滤逻辑保持一致(排除无 file_path 的 finding)
+                filtered_findings = []
               for f in findings:
                   if isinstance(f, dict):
-                        file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
+                        raw_file_path = f.get("file_path") or f.get("file")
+                        file_path = raw_file_path if isinstance(raw_file_path, str) else ""
+                        if not file_path:
+                            raw_location = f.get("location", "")
+                            location = raw_location if isinstance(raw_location, str) else ""
+                            file_path = location.split(":")[0]
                       if file_path:
                           files_with_findings_set.add(file_path)
               task.files_with_findings = len(files_with_findings_set)

-                # 统计严重程度和验证状态
+                # 统计严重程度和验证状态(使用过滤后的列表)
               verified_count = 0
-                for f in findings:
-                    if isinstance(f, dict):
-                        sev = str(f.get("severity", "low")).lower()
-                        if sev == "critical":
-                            task.critical_count += 1
-                        elif sev == "high":
-                            task.high_count += 1
-                        elif sev == "medium":
-                            task.medium_count += 1
-                        elif sev == "low":
-                            task.low_count += 1
-                        # 🔥 统计已验证的发现
-                        if f.get("is_verified") or f.get("verdict") == "confirmed":
-                            verified_count += 1
+                for f in filtered_findings:
+                    sev = str(f.get("severity", "low")).lower()
+                    if sev == "critical":
+                        task.critical_count += 1
+                    elif sev == "high":
+                        task.high_count += 1
+                    elif sev == "medium":
+                        task.medium_count += 1
+                    elif sev == "low":
+                        task.low_count += 1
+                    # 🔥 统计已验证的发现
+                    if f.get("is_verified") or f.get("verdict") == "confirmed":
+                        verified_count += 1
               task.verified_count = verified_count
               
-                # 计算安全评分
-                task.security_score = _calculate_security_score(findings)
-                task.quality_score = _calculate_security_score(findings)
+                # 计算安全评分(使用过滤后的列表)
+                task.security_score = _calculate_security_score(filtered_findings)
+                task.quality_score = _calculate_security_score(filtered_findings)
Evidence
在任务完成分支中,代码声明了 filtered_findings = [],但随后的过滤循环只向 files_with_findings_set.add(file_path)
写入;该变量在当前作用域没有任何初始化。同时后续统计循环 for f in filtered_findings: 以及
_calculate_security_score(filtered_findings) 都使用了从未 append 的空列表,导致 verified_count/severity
分布/score 计算结果不正确(甚至在 NameError 前就已逻辑错误)。

backend/app/api/v1/endpoints/agent_tasks.py[570-605]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
在 `agent_tasks.py` 的任务完成统计逻辑中:
- `files_with_findings_set` 被使用但未初始化,运行时会直接 `NameError`。
- `filtered_findings` 声明后未被 `append`,导致后续严重度/verified/security_score 统计全部基于空列表。
### Issue Context
该代码位于任务完成后统计阶段,异常会让任务在“已保存 findings”后仍被标记为 FAILED/异常回滚统计字段,影响前端展示与任务状态。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[570-605]
### Suggested fix
1) 在进入循环前初始化:`files_with_findings_set = set()`。
2) 在过滤循环中,当 `file_path` 有效时同时:
 - `files_with_findings_set.add(file_path)`
 - `filtered_findings.append(f)`
3) (可选但更一致)将过滤条件与 `_save_findings` 保持一致(例如 `file_path.strip()` 校验),避免统计与入库数量不一致。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Stats loop type crash🐞 Bug ☼ Reliability
Description
agent_tasks 在任务完成时统计 findings 使用 fp.strip()loc.split(),若 file_path/file/location
为非字符串会抛异常,导致本应 COMPLETED 的任务在统计阶段被外层 except 捕获并标记 FAILED。Orchestrator 的 _normalize_finding
会保留非字符串的 location(仅在 isinstance(location, str) 时解析),因此该输入在真实链路中可达。
Code

backend/app/api/v1/endpoints/agent_tasks.py[R573-587]

             for f in findings:
-                    if isinstance(f, dict):
-                        file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
-                        if file_path:
-                            files_with_findings_set.add(file_path)
+                    if not isinstance(f, dict):
+                        continue
+                    fp = f.get("file_path") or f.get("file") or ""
+                    if not fp.strip() and f.get("location"):
+                        loc = f.get("location", "")
+                        fp = loc.split(":")[0] if ":" in loc else loc
+                    if fp and fp.strip():
+                        filtered_findings.append(f)
+
+                files_with_findings_set = set()
+                for f in filtered_findings:
+                    file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
+                    if file_path:
+                        files_with_findings_set.add(file_path)
Evidence
统计代码直接对 fp 调用 .strip()、对 loc 调用 .split(),但 fp/loc 并未保证为字符串;同时 Orchestrator 标准化仅在
location 为字符串时才解析成 file_path,否则会原样保留,从而可能把非字符串 location 传递到该统计逻辑,引发 AttributeError。该统计块位于任务完成大
try 内,异常会落入外层 except Exception,进而更新任务状态为 FAILED。

backend/app/api/v1/endpoints/agent_tasks.py[565-662]
backend/app/api/v1/endpoints/agent_tasks.py[570-588]
backend/app/services/agent/agents/orchestrator.py[1137-1159]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`backend/app/api/v1/endpoints/agent_tasks.py` 在完成任务统计阶段对 `file_path`/`file`/`location` 做字符串操作(`.strip()` / `.split(':')`),但这些字段可能来自 LLM/子 Agent 输出,类型不一定是 `str`。当出现 `dict/list` 等类型时会触发 `AttributeError`,从而让任务在“统计”环节失败并被标记为 FAILED。
### Issue Context
Orchestrator 的 `_normalize_finding` 只在 `location` 为字符串时才解析,否则会保留原值;因此非字符串 `location` 可以到达 agent_tasks 的统计逻辑。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[570-588]
### What to change
1. 提取一个小函数(或内联)做安全的 file_path 解析:
- `fp = f.get('file_path')` / `f.get('file')` 只有在 `isinstance(x, str)` 时才使用,否则当作空字符串。
- `location` 仅在 `isinstance(loc, str)` 时才做 `':' in loc` / `loc.split(':', 1)`。
2. 第二个循环不要再次对 `f.get('location', '').split(...)` 做无类型保护的 split;直接使用前面解析出来的 `fp`(必要时将 `fp` 写回 `f['file_path']` 或在 `filtered_findings` 中存 tuple)。
3. 单元/集成层面:构造 `finding={'location': {'file':'a.py'}}` / `{'file_path': {'x':1}}` 确保不会导致任务失败。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Normalized None causes crash🐞 Bug ☼ Reliability
Description
Orchestrator 在合并 findings 时先调用 _normalize_finding(new_f),但不检查其返回值就对 normalized_new 调用 .get()。由于本 PR
放宽 high_risk_areas 的“像文件路径”判断,更容易把不存在的路径写入 file_path,触发 _normalize_finding 返回 None 并导致任务直接崩溃。
Code

backend/app/services/agent/agents/orchestrator.py[R881-886]

+                                # 🔥 FIX: 放宽文件路径校验 - 不再限制扩展名,只要像文件路径就提取
                           if ("." in potential_file and
                               " " not in potential_file and
                               len(potential_file) < 100 and
-                                    any(potential_file.endswith(ext) for ext in ['.py', '.js', '.ts', '.java', '.go', '.php', '.rb', '.c', '.cpp', '.h'])):
+                                    not potential_file.endswith("/")):
                               file_path = potential_file
Evidence
本 PR 放宽了 high_risk_areas 的文件路径提取条件(不再限制扩展名),会把更多字符串写入 file_path;而 _normalize_finding 在 file_path
不存在时会 return None。合并逻辑未对 None 做保护,随后的 normalized_new.get(...) 会抛 AttributeError 终止整个 orchestrator。

backend/app/services/agent/agents/orchestrator.py[876-886]
backend/app/services/agent/agents/orchestrator.py[932-941]
backend/app/services/agent/agents/orchestrator.py[1223-1231]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Orchestrator merge loop assumes `_normalize_finding()` always returns a dict, but `_normalize_finding()` can return `None` when file_path validation fails. The merge loop then calls `.get()` on `None`, crashing the entire task.
## Issue Context
This PR relaxes the file-path heuristic for `high_risk_areas`, increasing the chance of extracting a non-existent path and hitting the `return None` branch.
## Fix Focus Areas
- backend/app/services/agent/agents/orchestrator.py[932-941]
- backend/app/services/agent/agents/orchestrator.py[1223-1231]
- backend/app/services/agent/agents/orchestrator.py[876-886]
## Suggested change
- After `normalized_new = self._normalize_finding(new_f)`, add:
- `if not normalized_new: continue`
- Optionally: validate existence before setting `file_path` in the `high_risk_areas` conversion path, or set `file_path` only after `_validate_file_path()` passes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. Stats use unsaved findings🐞 Bug ≡ Correctness
Description
_save_findings 现在会直接跳过无 file_path 的 finding,但任务完成时的严重程度计数、verified_count 以及 security_score 仍基于原始
findings 列表计算。这样会出现 findings_count(已保存数量)与各统计项口径不一致,导致前端/报告展示错误。
Code

backend/app/api/v1/endpoints/agent_tasks.py[R1271-1277]

+            # 🔥 v2.2: file_path 为空直接跳过
+            if not file_path or not file_path.strip():
+                logger.warning(
+                    f"[SaveFindings] 🚫 跳过无 file_path 的 finding: "
+                    f"title={finding.get('title', 'N/A')[:50]}, type={finding.get('vulnerability_type', '?')}"
+                )
+                continue
Evidence
_save_findings 新增“无 file_path 直接 continue”过滤后,saved_count 会减少并写入 task.findings_count;但同一执行流程后续仍用未过滤的
findings 计算 files_with_findings、severity 分布、verified_count 和 security_score,导致统计包含未入库的数据。

backend/app/api/v1/endpoints/agent_tasks.py[1271-1293]
backend/app/api/v1/endpoints/agent_tasks.py[522-547]
backend/app/api/v1/endpoints/agent_tasks.py[565-599]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Task statistics are computed from the original `findings` list even though `_save_findings()` filters/skips some findings (now including empty `file_path`). This makes task counters and scores inconsistent with what is actually stored in DB.
## Issue Context
`task.findings_count` is set from `saved_count`, but other counters (severity buckets, verified_count, security_score, files_with_findings) are still derived from the unfiltered `findings`.
## Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[522-547]
- backend/app/api/v1/endpoints/agent_tasks.py[565-599]
- backend/app/api/v1/endpoints/agent_tasks.py[1157-1426]
## Suggested change
Choose one:
1) Filter once, then use the filtered list for both saving and stats.
- Extract a helper like `_filter_valid_findings(findings, project_root)` that mirrors `_save_findings` filtering rules.
2) After commit, query DB for `AgentFinding` rows for the task and compute stats from DB rows (authoritative).
Ensure `files_with_findings`, severity counts, verified_count, and security_score are based on the same set as `saved_count`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Verdict migration missing 🐞 Bug ≡ Correctness
Description
AgentFinding 新增了 verdict 字段且保存时会写入该列,但 Alembic 迁移里 agent_findings 表定义不包含
verdict,运行时插入会失败并触发回滚,导致任务可能“完成”但 findings 实际未入库。
Code

backend/app/api/v1/endpoints/agent_tasks.py[R1390-1394]

        code_snippet=code_snippet[:10000] if code_snippet else None,
        suggestion=suggestion[:5000] if suggestion else None,
        is_verified=is_verified,
+                verdict=verdict,  # 🔥 新增:保存 verdict 到数据库
        ai_confidence=confidence,  # 🔥 FIX: Use ai_confidence, not confidence
Evidence
当前代码在创建 AgentFinding ORM 实例时传入 verdict;但现有创建 agent_findings 的 Alembic 迁移不含 verdict 列(且 migrations
中仅该文件创建该表),因此数据库未升级时 INSERT/COMMIT 会报“column verdict does not exist”并回滚。

backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
backend/app/models/agent_task.py[355-363]
backend/alembic/versions/006_add_agent_tables.py[146-183]
backend/alembic/versions/006_add_agent_tables.py[225-232]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`AgentFinding.verdict` 已写入 ORM 并在保存 findings 时赋值,但数据库迁移未添加该列,导致写库失败并回滚。
### Issue Context
- 现有 `agent_findings` 表由 Alembic revision `006_add_agent_tables` 创建,未包含 `verdict`。
- `_save_findings` 构建 `AgentFinding(..., verdict=verdict)` 会在 DB schema 未升级时触发插入失败。
### Fix Focus Areas
- backend/alembic/versions/006_add_agent_tables.py[146-232]
- backend/app/models/agent_task.py[355-363]
- backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
### Suggested fix
1. 新增一个 Alembic revision:
- `op.add_column('agent_findings', sa.Column('verdict', sa.String(length=30), nullable=True))`
- `op.create_index('ix_agent_findings_verdict', 'agent_findings', ['verdict'])`(如需要)
2. downgrade 中对称 `drop_index/drop_column`。
3.(可选)若写库失败不应“静默成功”,考虑在 `_save_findings` commit 失败时向上抛错或将 `saved_count` 置 0,避免任务统计误导。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Location-only findings dropped🐞 Bug ≡ Correctness
Description
AnalysisAgent 标准化阶段强制要求 finding 必须包含非空 file_path,导致仅提供 location/file 的 findings 被直接丢弃,即使系统其他模块已支持从
location/file 推导 file_path。
Code

backend/app/services/agent/agents/analysis.py[R769-777]

+                # 🔥 v2.2: file_path 必填校验 - 没有 file_path 的 finding 直接拒绝
+                file_path = finding.get("file_path", "") or ""
+                if not file_path.strip():
+                    skipped_no_filepath += 1
+                    logger.warning(
+                        f"[Analysis] 🚫 跳过无 file_path 的 finding: "
+                        f"title={finding.get('title', '?')[:50]}, type={finding.get('vulnerability_type', '?')}"
+                    )
+                    continue
Evidence
AnalysisAgent 只检查 finding['file_path'],缺失则 continue;但 Kunlun 工具输出使用 location 字段而非
file_path,Orchestrator 也实现了 location/file -> file_path 的标准化逻辑,且 _save_findings 同样支持从
location/file 回退生成 file_path。当前 AnalysisAgent 的硬校验会让这类结果在进入后续链路前就被丢弃。

backend/app/services/agent/agents/analysis.py[759-800]
backend/app/services/agent/tools/kunlun_tool.py[411-433]
backend/app/services/agent/agents/orchestrator.py[1135-1161]
backend/app/api/v1/endpoints/agent_tasks.py[1263-1278]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
AnalysisAgent 在标准化 findings 时仅接受非空 `file_path`,会丢弃仅提供 `location`/`file` 的 findings;但仓库内多个组件/工具确实会产出 `location` 字段。
### Issue Context
- Kunlun 工具解析输出时产出 `location`。
- Orchestrator 已实现 `location/file -> file_path` 的标准化。
- `_save_findings` 也支持从 `file`/`location` 回退提取 `file_path`。
### Fix Focus Areas
- backend/app/services/agent/agents/analysis.py[759-792]
- backend/app/services/agent/agents/orchestrator.py[1135-1161]
- backend/app/services/agent/tools/kunlun_tool.py[411-433]
### Suggested fix
在 AnalysisAgent 标准化时,对输入 finding 做与 Orchestrator 类似的兼容:
1. `file_path = finding.get('file_path') or finding.get('file') or parse_location(finding.get('location'))`
2. 仅在上述推导后仍为空时才跳过。
3. 同时将推导后的 `file_path` 写入 standardized finding,保证后续 Verification/Save 都能使用。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
7. Over-broad cross-file merge🐞 Bug ≡ Correctness
Description
Orchestrator 去重合并在 new_file 为空或 "?" 时,仅凭 same_type && same_line 就允许跨文件匹配并合并,这可能把不同文件但行号/类型相同的
finding 错误合并,导致验证结果或描述被串写。该行为改变了原本以同文件为主的匹配策略。
Code

backend/app/services/agent/agents/orchestrator.py[R971-987]

+                            elif same_type and same_line and not same_file:
+                                # 🔥 FIX: Only allow cross-file matching when:
+                                # 1. new_file is garbage ("?"/empty) - verification returned bad path
+                                # 2. One path is a prefix of the other ("src/foo.py" vs "foo.py")
+                                # Do NOT merge when existing_file is garbage - that would lose the real path.
+                                new_is_garbage = not new_file or new_file == "?"
+                                prefix_match = (
+                                    new_file.endswith("/" + existing_file) or
+                                    existing_file.endswith("/" + new_file)
+                                )
+                                if new_is_garbage or prefix_match:
+                                    match_found = True
+                                    logger.info(f"[Orchestrator] Matched by type+line despite file mismatch: {new_file} vs {existing_file}")
+                                else:
+                                    match_found = False
+                            else:
+                                match_found = False
Evidence
新逻辑在 same_type and same_line and not same_file 分支中,当 new_file 为空或 "?" 时直接 `match_found =
True,未要求 similar_desc` 或任何路径关系;因此如果某个 Agent/验证步骤产出 garbage path(空/"?"),就可能与任意文件中同 type+line 的
finding 合并。

backend/app/services/agent/agents/orchestrator.py[968-1017]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below...

Comment on lines 1390 to 1394
code_snippet=code_snippet[:10000] if code_snippet else None,
suggestion=suggestion[:5000] if suggestion else None,
is_verified=is_verified,
verdict=verdict, # 🔥 新增:保存 verdict 到数据库
ai_confidence=confidence, # 🔥 FIX: Use ai_confidence, not confidence
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Verdict migration missing 🐞 Bug ≡ Correctness

AgentFinding 新增了 verdict 字段且保存时会写入该列,但 Alembic 迁移里 agent_findings 表定义不包含
verdict,运行时插入会失败并触发回滚,导致任务可能“完成”但 findings 实际未入库。
Agent Prompt
### Issue description
`AgentFinding.verdict` 已写入 ORM 并在保存 findings 时赋值,但数据库迁移未添加该列,导致写库失败并回滚。

### Issue Context
- 现有 `agent_findings` 表由 Alembic revision `006_add_agent_tables` 创建,未包含 `verdict`。
- `_save_findings` 构建 `AgentFinding(..., verdict=verdict)` 会在 DB schema 未升级时触发插入失败。

### Fix Focus Areas
- backend/alembic/versions/006_add_agent_tables.py[146-232]
- backend/app/models/agent_task.py[355-363]
- backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]

### Suggested fix
1. 新增一个 Alembic revision:
   - `op.add_column('agent_findings', sa.Column('verdict', sa.String(length=30), nullable=True))`
   - `op.create_index('ix_agent_findings_verdict', 'agent_findings', ['verdict'])`(如需要)
2. downgrade 中对称 `drop_index/drop_column`。
3.(可选)若写库失败不应“静默成功”,考虑在 `_save_findings` commit 失败时向上抛错或将 `saved_count` 置 0,避免任务统计误导。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread backend/app/services/agent/agents/analysis.py
1. Add Alembic migration for verdict column in agent_findings table
2. Analysis agent: fallback to location/file fields when file_path is empty
3. API response: expose verdict field in AgentFindingResponse
@Windelly
Copy link
Copy Markdown
Author

Windelly commented May 5, 2026

@CodiumAI-Agent review

@Windelly
Copy link
Copy Markdown
Author

Windelly commented May 5, 2026

Closing and reopening to trigger Qodo re-review after addressing feedback.

@Windelly Windelly closed this May 5, 2026
@Windelly Windelly reopened this May 5, 2026
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 5, 2026

Persistent review updated to latest commit 5b9d3e0

Comment thread backend/app/services/agent/agents/orchestrator.py
Comment thread backend/app/api/v1/endpoints/agent_tasks.py
@Windelly
Copy link
Copy Markdown
Author

Windelly commented May 5, 2026

Round 2 fixes pushed. Reopening for Qodo re-review.

@Windelly Windelly closed this May 5, 2026
@Windelly Windelly reopened this May 5, 2026
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 5, 2026

Persistent review updated to latest commit f3d0b74

Comment thread backend/app/api/v1/endpoints/agent_tasks.py Outdated
@Windelly
Copy link
Copy Markdown
Author

Windelly commented May 5, 2026

Round 3 fixes pushed. Reopening for Qodo.

@Windelly Windelly closed this May 5, 2026
@Windelly Windelly reopened this May 5, 2026
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 5, 2026

Persistent review updated to latest commit 9c77394

Comment thread backend/app/api/v1/endpoints/agent_tasks.py
@Windelly
Copy link
Copy Markdown
Author

Windelly commented May 5, 2026

Fixed NameError. Reopening for Qodo.

@Windelly Windelly closed this May 5, 2026
@Windelly Windelly reopened this May 5, 2026
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 5, 2026

Persistent review updated to latest commit e56dcfd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant