fix: 修复 file_path 始终为 null 的问题 by Windelly · Pull Request #217 · lintsinghua/DeepAudit

Windelly · 2026-05-04T14:53:54Z

根因分析

本 PR 修复了 Agent 审计任务中 findings 的字段始终为 null 的问题，该问题导致前端无法定位漏洞文件位置。

Bug 1：三元表达式运算符优先级错误

中从字段提取文件路径时，三元表达式缺少括号：

# 修复前（错误）：Python 运算符优先级导致整个表达式被解析为条件表达式的 body
finding.get('location', '').split(':')[0] if ':' in finding.get('location', '') else finding.get('location')

# 修复后（正确）：加括号明确优先级
(finding.get('location', '').split(':')[0] if ':' in finding.get('location', '') else finding.get('location', ''))

Bug 2：无 file_path 的 findings 直接入库

LLM 有时返回不含的 findings，这些无效数据直接写入数据库。

Bug 3：orchestrator 文件扩展名白名单过严

只允许，漏掉等配置文件。

Bug 4：merge 逻辑未考虑 garbage path

当 new_file 为无效值（如 ?）时，仍可能覆盖 existing 的有效路径。

修复内容

文件	改动
	三元表达式加括号 + file_path 空值跳过 + verdict 字段保存
	放宽扩展名校验 + garbage path 不参与 merge
	AgentFinding 模型新增列
	_standardize_findings 新增 file_path 必填校验

验证

本地测试通过
数据库 migration 需执行（新增 verdict 列）

根因： 1. agent_tasks.py 中三元表达式缺少括号，运算符优先级导致 location 解析异常 2. LLM 返回的 findings 可能没有 file_path，直接保存导致数据库中大量 null 3. orchestrator 文件扩展名白名单过于严格，漏掉 .yaml/.json 等非传统代码文件 4. merge 逻辑未考虑 garbage path，可能将有效路径覆盖为无效值修复内容： - 三元表达式加括号修复优先级 bug - _save_findings 和 analysis 中新增 file_path 必填校验，空值跳过并记录 warning - 新增 verdict 字段保存到数据库（confirmed/likely/uncertain/false_positive） - orchestrator 放宽文件扩展名校验，改用 endsWith('/') 排除目录 - merge 逻辑增强：garbage path 不参与 merge，避免覆盖有效路径

vercel · 2026-05-04T14:53:59Z

@Windelly is attempting to deploy a commit to the tsinghuaiiilove-2257's projects Team on Vercel.

A member of the Team first needs to authorize it.

qodo-free-for-open-source-projects · 2026-05-04T14:54:16Z

Review Summary by Qodo

(Agentic_describe updated until commit `e56dcfd`)

Fix file_path null issue with validation and enhanced merge logic

🐞 Bug fix ✨ Enhancement

Walkthroughs

Description

• Fix file_path null issue by adding type-safe extraction and validation
  - Correct ternary operator precedence with parentheses
  - Skip findings without valid file_path with warning logs
  - Add fallback to location/file fields when file_path empty
• Enhance finding deduplication and merge logic
  - Prevent garbage paths from overwriting valid file paths
  - Support cross-file matching only for garbage paths or prefix matches
  - Relax orchestrator file extension validation to support config files
• Add verdict field to track finding confidence levels
  - New database column for confirmed/likely/uncertain/false_positive
  - Expose verdict in API response and use for verification status
  - Include verdict in statistics calculation
• Improve findings filtering consistency across analysis pipeline
  - Filter invalid findings before statistics calculation
  - Use filtered findings for severity and verification counts
  - Add Alembic migration for verdict column

Diagram

flowchart LR
  A["Raw Findings from LLM"] -->|Type-safe extraction| B["Extract file_path"]
  B -->|Validate & filter| C["Skip invalid findings"]
  C -->|Normalize| D["Normalized Findings"]
  D -->|Dedup logic| E["Merge with existing"]
  E -->|Prevent garbage overwrite| F["Valid Findings"]
  F -->|Save to DB| G["AgentFinding with verdict"]
  G -->|Expose in API| H["AgentFindingResponse"]

File Changes

1. backend/alembic/versions/009_add_verdict_to_agent_findings.py Database migration +26/-0

Add Alembic migration for verdict column
• Create new Alembic migration to add verdict column to agent_findings table
• Add String(30) nullable verdict column with index
• Include downgrade function to drop column and index
backend/alembic/versions/009_add_verdict_to_agent_findings.py

2. backend/app/models/agent_task.py Database schema +1/-0

Add verdict column to AgentFinding model

• Add verdict column to AgentFinding model as String(30) nullable with index
• Column stores verdict values: confirmed/likely/uncertain/false_positive
• Positioned after status column in verification information section

backend/app/models/agent_task.py

3. backend/app/api/v1/endpoints/agent_tasks.py Bug fix, enhancement +49/-27

Fix file_path extraction and enhance findings filtering

• Add verdict field to AgentFindingResponse model as Optional[str]
• Fix file_path extraction with type-safe handling and proper precedence
• Skip findings without valid file_path and log warnings
• Filter findings before statistics calculation to ensure consistency
• Use filtered_findings for severity counts and verification status
• Calculate security/quality scores using filtered findings only
• Add verdict field to AgentFinding ORM model creation

backend/app/api/v1/endpoints/agent_tasks.py

View more (2)

4. backend/app/services/agent/agents/analysis.py Bug fix, error handling +30/-1

Add file_path validation and filtering in analysis agent

• Add file_path validation with fallback to file/location fields
• Skip findings without valid file_path and track skipped count
• Add type checking for location field to handle non-string values
• Log warnings for invalid findings with details
• Use extracted file_path in standardized findings output
• Report skipped findings count in completion message

backend/app/services/agent/agents/analysis.py

5. backend/app/services/agent/agents/orchestrator.py Bug fix, enhancement +25/-1

Enhance file validation and merge deduplication logic

• Relax file extension validation by replacing whitelist with endsWith("/") check
• Support config files like .yaml, .json, .xml in addition to code files
• Add garbage path detection to prevent invalid paths from overwriting valid ones
• Enhance cross-file matching logic with prefix matching support
• Add type-safe normalization check before processing findings
• Implement smart merge logic that preserves valid file paths

backend/app/services/agent/agents/orchestrator.py

qodo-free-for-open-source-projects · 2026-05-04T14:54:17Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

1. ~~Stats block NameError~~ ☑ 🐞 Bug ≡ Correctness

Description

任务完成后的统计逻辑里使用了 files_with_findings_set 但未定义，会在任务即将完成时抛出 NameError 并导致任务标记失败。同时 filtered_findings
从未被填充，导致严重度/verified/security_score 统计全部基于空列表计算。

Code

backend/app/api/v1/endpoints/agent_tasks.py[R570-604]

+                # 🔥 FIX: 先过滤 findings，再用过滤后的列表做统计
+                # 与 _save_findings 的过滤逻辑保持一致（排除无 file_path 的 finding）
+                filtered_findings = []
              for f in findings:
                  if isinstance(f, dict):
-                        file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
+                        raw_file_path = f.get("file_path") or f.get("file")
+                        file_path = raw_file_path if isinstance(raw_file_path, str) else ""
+                        if not file_path:
+                            raw_location = f.get("location", "")
+                            location = raw_location if isinstance(raw_location, str) else ""
+                            file_path = location.split(":")[0]
                      if file_path:
                          files_with_findings_set.add(file_path)
              task.files_with_findings = len(files_with_findings_set)
-                # 统计严重程度和验证状态
+                # 统计严重程度和验证状态（使用过滤后的列表）
              verified_count = 0
-                for f in findings:
-                    if isinstance(f, dict):
-                        sev = str(f.get("severity", "low")).lower()
-                        if sev == "critical":
-                            task.critical_count += 1
-                        elif sev == "high":
-                            task.high_count += 1
-                        elif sev == "medium":
-                            task.medium_count += 1
-                        elif sev == "low":
-                            task.low_count += 1
-                        # 🔥 统计已验证的发现
-                        if f.get("is_verified") or f.get("verdict") == "confirmed":
-                            verified_count += 1
+                for f in filtered_findings:
+                    sev = str(f.get("severity", "low")).lower()
+                    if sev == "critical":
+                        task.critical_count += 1
+                    elif sev == "high":
+                        task.high_count += 1
+                    elif sev == "medium":
+                        task.medium_count += 1
+                    elif sev == "low":
+                        task.low_count += 1
+                    # 🔥 统计已验证的发现
+                    if f.get("is_verified") or f.get("verdict") == "confirmed":
+                        verified_count += 1
              task.verified_count = verified_count
              
-                # 计算安全评分
-                task.security_score = _calculate_security_score(findings)
-                task.quality_score = _calculate_security_score(findings)
+                # 计算安全评分（使用过滤后的列表）
+                task.security_score = _calculate_security_score(filtered_findings)
+                task.quality_score = _calculate_security_score(filtered_findings)

Evidence

在任务完成分支中，代码声明了 filtered_findings = []，但随后的过滤循环只向 files_with_findings_set.add(file_path)
写入；该变量在当前作用域没有任何初始化。同时后续统计循环 for f in filtered_findings: 以及
_calculate_security_score(filtered_findings) 都使用了从未 append 的空列表，导致 verified_count/severity
分布/score 计算结果不正确（甚至在 NameError 前就已逻辑错误）。

backend/app/api/v1/endpoints/agent_tasks.py[570-605]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
在 `agent_tasks.py` 的任务完成统计逻辑中：
- `files_with_findings_set` 被使用但未初始化，运行时会直接 `NameError`。
- `filtered_findings` 声明后未被 `append`，导致后续严重度/verified/security_score 统计全部基于空列表。
### Issue Context
该代码位于任务完成后统计阶段，异常会让任务在“已保存 findings”后仍被标记为 FAILED/异常回滚统计字段，影响前端展示与任务状态。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[570-605]
### Suggested fix
1) 在进入循环前初始化：`files_with_findings_set = set()`。
2) 在过滤循环中，当 `file_path` 有效时同时：
- `files_with_findings_set.add(file_path)`
- `filtered_findings.append(f)`
3) （可选但更一致）将过滤条件与 `_save_findings` 保持一致（例如 `file_path.strip()` 校验），避免统计与入库数量不一致。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Stats loop type crash~~ ☑ 🐞 Bug ☼ Reliability

Description

agent_tasks 在任务完成时统计 findings 使用 fp.strip() 和 loc.split()，若 file_path/file/location
为非字符串会抛异常，导致本应 COMPLETED 的任务在统计阶段被外层 except 捕获并标记 FAILED。Orchestrator 的 _normalize_finding
会保留非字符串的 location（仅在 isinstance(location, str) 时解析），因此该输入在真实链路中可达。

Code

backend/app/api/v1/endpoints/agent_tasks.py[R573-587]

            for f in findings:
-                    if isinstance(f, dict):
-                        file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
-                        if file_path:
-                            files_with_findings_set.add(file_path)
+                    if not isinstance(f, dict):
+                        continue
+                    fp = f.get("file_path") or f.get("file") or ""
+                    if not fp.strip() and f.get("location"):
+                        loc = f.get("location", "")
+                        fp = loc.split(":")[0] if ":" in loc else loc
+                    if fp and fp.strip():
+                        filtered_findings.append(f)
+
+                files_with_findings_set = set()
+                for f in filtered_findings:
+                    file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
+                    if file_path:
+                        files_with_findings_set.add(file_path)

Evidence

统计代码直接对 fp 调用 .strip()、对 loc 调用 .split()，但 fp/loc 并未保证为字符串；同时 Orchestrator 标准化仅在
location 为字符串时才解析成 file_path，否则会原样保留，从而可能把非字符串 location 传递到该统计逻辑，引发 AttributeError。该统计块位于任务完成大
try 内，异常会落入外层 except Exception，进而更新任务状态为 FAILED。

backend/app/api/v1/endpoints/agent_tasks.py[565-662]
backend/app/api/v1/endpoints/agent_tasks.py[570-588]
backend/app/services/agent/agents/orchestrator.py[1137-1159]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`backend/app/api/v1/endpoints/agent_tasks.py` 在完成任务统计阶段对 `file_path`/`file`/`location` 做字符串操作（`.strip()` / `.split(':')`），但这些字段可能来自 LLM/子 Agent 输出，类型不一定是 `str`。当出现 `dict/list` 等类型时会触发 `AttributeError`，从而让任务在“统计”环节失败并被标记为 FAILED。
### Issue Context
Orchestrator 的 `_normalize_finding` 只在 `location` 为字符串时才解析，否则会保留原值；因此非字符串 `location` 可以到达 agent_tasks 的统计逻辑。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[570-588]
### What to change
1. 提取一个小函数（或内联）做安全的 file_path 解析：
- `fp = f.get('file_path')` / `f.get('file')` 只有在 `isinstance(x, str)` 时才使用，否则当作空字符串。
- `location` 仅在 `isinstance(loc, str)` 时才做 `':' in loc` / `loc.split(':', 1)`。
2. 第二个循环不要再次对 `f.get('location', '').split(...)` 做无类型保护的 split；直接使用前面解析出来的 `fp`（必要时将 `fp` 写回 `f['file_path']` 或在 `filtered_findings` 中存 tuple）。
3. 单元/集成层面：构造 `finding={'location': {'file':'a.py'}}` / `{'file_path': {'x':1}}` 确保不会导致任务失败。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Normalized None causes crash~~ ☑ 🐞 Bug ☼ Reliability

Description

Orchestrator 在合并 findings 时先调用 _normalize_finding(new_f)，但不检查其返回值就对 normalized_new 调用 .get()。由于本 PR
放宽 high_risk_areas 的“像文件路径”判断，更容易把不存在的路径写入 file_path，触发 _normalize_finding 返回 None 并导致任务直接崩溃。

Code

backend/app/services/agent/agents/orchestrator.py[R881-886]

+                                # 🔥 FIX: 放宽文件路径校验 - 不再限制扩展名，只要像文件路径就提取
                          if ("." in potential_file and
                              " " not in potential_file and
                              len(potential_file) < 100 and
-                                    any(potential_file.endswith(ext) for ext in ['.py', '.js', '.ts', '.java', '.go', '.php', '.rb', '.c', '.cpp', '.h'])):
+                                    not potential_file.endswith("/")):
                              file_path = potential_file

Evidence
本 PR 放宽了 high_risk_areas 的文件路径提取条件（不再限制扩展名），会把更多字符串写入 file_path；而 _normalize_finding 在 file_path
不存在时会 return None。合并逻辑未对 None 做保护，随后的 normalized_new.get(...) 会抛 AttributeError 终止整个 orchestrator。
backend/app/services/agent/agents/orchestrator.py[876-886]
backend/app/services/agent/agents/orchestrator.py[932-941]
backend/app/services/agent/agents/orchestrator.py[1223-1231]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Orchestrator merge loop assumes `_normalize_finding()` always returns a dict, but `_normalize_finding()` can return `None` when file_path validation fails. The merge loop then calls `.get()` on `None`, crashing the entire task.
## Issue Context
This PR relaxes the file-path heuristic for `high_risk_areas`, increasing the chance of extracting a non-existent path and hitting the `return None` branch.
## Fix Focus Areas
- backend/app/services/agent/agents/orchestrator.py[932-941]
- backend/app/services/agent/agents/orchestrator.py[1223-1231]
- backend/app/services/agent/agents/orchestrator.py[876-886]
## Suggested change
- After `normalized_new = self._normalize_finding(new_f)`, add:
- `if not normalized_new: continue`
- Optionally: validate existence before setting `file_path` in the `high_risk_areas` conversion path, or set `file_path` only after `_validate_file_path()` passes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (3)

4. ~~Stats use unsaved findings~~ ☑ 🐞 Bug ≡ Correctness

Description

_save_findings 现在会直接跳过无 file_path 的 finding，但任务完成时的严重程度计数、verified_count 以及 security_score 仍基于原始
findings 列表计算。这样会出现 findings_count（已保存数量）与各统计项口径不一致，导致前端/报告展示错误。

Code

backend/app/api/v1/endpoints/agent_tasks.py[R1271-1277]

+            # 🔥 v2.2: file_path 为空直接跳过
+            if not file_path or not file_path.strip():
+                logger.warning(
+                    f"[SaveFindings] 🚫 跳过无 file_path 的 finding: "
+                    f"title={finding.get('title', 'N/A')[:50]}, type={finding.get('vulnerability_type', '?')}"
+                )
+                continue

Evidence
_save_findings 新增“无 file_path 直接 continue”过滤后，saved_count 会减少并写入 task.findings_count；但同一执行流程后续仍用未过滤的
findings 计算 files_with_findings、severity 分布、verified_count 和 security_score，导致统计包含未入库的数据。
backend/app/api/v1/endpoints/agent_tasks.py[1271-1293]
backend/app/api/v1/endpoints/agent_tasks.py[522-547]
backend/app/api/v1/endpoints/agent_tasks.py[565-599]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Task statistics are computed from the original `findings` list even though `_save_findings()` filters/skips some findings (now including empty `file_path`). This makes task counters and scores inconsistent with what is actually stored in DB.
## Issue Context
`task.findings_count` is set from `saved_count`, but other counters (severity buckets, verified_count, security_score, files_with_findings) are still derived from the unfiltered `findings`.
## Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[522-547]
- backend/app/api/v1/endpoints/agent_tasks.py[565-599]
- backend/app/api/v1/endpoints/agent_tasks.py[1157-1426]
## Suggested change
Choose one:
1) Filter once, then use the filtered list for both saving and stats.
- Extract a helper like `_filter_valid_findings(findings, project_root)` that mirrors `_save_findings` filtering rules.
2) After commit, query DB for `AgentFinding` rows for the task and compute stats from DB rows (authoritative).
Ensure `files_with_findings`, severity counts, verified_count, and security_score are based on the same set as `saved_count`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. Verdict migration missing 🐞 Bug ≡ Correctness

Description

AgentFinding 新增了 verdict 字段且保存时会写入该列，但 Alembic 迁移里 agent_findings 表定义不包含
verdict，运行时插入会失败并触发回滚，导致任务可能“完成”但 findings 实际未入库。

Code

backend/app/api/v1/endpoints/agent_tasks.py[R1390-1394]

       code_snippet=code_snippet[:10000] if code_snippet else None,
       suggestion=suggestion[:5000] if suggestion else None,
       is_verified=is_verified,
+                verdict=verdict,  # 🔥 新增：保存 verdict 到数据库
       ai_confidence=confidence,  # 🔥 FIX: Use ai_confidence, not confidence

Evidence
当前代码在创建 AgentFinding ORM 实例时传入 verdict；但现有创建 agent_findings 的 Alembic 迁移不含 verdict 列（且 migrations
中仅该文件创建该表），因此数据库未升级时 INSERT/COMMIT 会报“column verdict does not exist”并回滚。
backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
backend/app/models/agent_task.py[355-363]
backend/alembic/versions/006_add_agent_tables.py[146-183]
backend/alembic/versions/006_add_agent_tables.py[225-232]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`AgentFinding.verdict` 已写入 ORM 并在保存 findings 时赋值，但数据库迁移未添加该列，导致写库失败并回滚。
### Issue Context
- 现有 `agent_findings` 表由 Alembic revision `006_add_agent_tables` 创建，未包含 `verdict`。
- `_save_findings` 构建 `AgentFinding(..., verdict=verdict)` 会在 DB schema 未升级时触发插入失败。
### Fix Focus Areas
- backend/alembic/versions/006_add_agent_tables.py[146-232]
- backend/app/models/agent_task.py[355-363]
- backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
### Suggested fix
1. 新增一个 Alembic revision：
- `op.add_column('agent_findings', sa.Column('verdict', sa.String(length=30), nullable=True))`
- `op.create_index('ix_agent_findings_verdict', 'agent_findings', ['verdict'])`（如需要）
2. downgrade 中对称 `drop_index/drop_column`。
3.（可选）若写库失败不应“静默成功”，考虑在 `_save_findings` commit 失败时向上抛错或将 `saved_count` 置 0，避免任务统计误导。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

6. ~~Location-only findings dropped~~ ☑ 🐞 Bug ≡ Correctness

Description

AnalysisAgent 标准化阶段强制要求 finding 必须包含非空 file_path，导致仅提供 location/file 的 findings 被直接丢弃，即使系统其他模块已支持从
location/file 推导 file_path。

Code

backend/app/services/agent/agents/analysis.py[R769-777]

+                # 🔥 v2.2: file_path 必填校验 - 没有 file_path 的 finding 直接拒绝
+                file_path = finding.get("file_path", "") or ""
+                if not file_path.strip():
+                    skipped_no_filepath += 1
+                    logger.warning(
+                        f"[Analysis] 🚫 跳过无 file_path 的 finding: "
+                        f"title={finding.get('title', '?')[:50]}, type={finding.get('vulnerability_type', '?')}"
+                    )
+                    continue

Evidence
AnalysisAgent 只检查 finding['file_path']，缺失则 continue；但 Kunlun 工具输出使用 location 字段而非
file_path，Orchestrator 也实现了 location/file -> file_path 的标准化逻辑，且 _save_findings 同样支持从
location/file 回退生成 file_path。当前 AnalysisAgent 的硬校验会让这类结果在进入后续链路前就被丢弃。
backend/app/services/agent/agents/analysis.py[759-800]
backend/app/services/agent/tools/kunlun_tool.py[411-433]
backend/app/services/agent/agents/orchestrator.py[1135-1161]
backend/app/api/v1/endpoints/agent_tasks.py[1263-1278]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
AnalysisAgent 在标准化 findings 时仅接受非空 `file_path`，会丢弃仅提供 `location`/`file` 的 findings；但仓库内多个组件/工具确实会产出 `location` 字段。
### Issue Context
- Kunlun 工具解析输出时产出 `location`。
- Orchestrator 已实现 `location/file -> file_path` 的标准化。
- `_save_findings` 也支持从 `file`/`location` 回退提取 `file_path`。
### Fix Focus Areas
- backend/app/services/agent/agents/analysis.py[759-792]
- backend/app/services/agent/agents/orchestrator.py[1135-1161]
- backend/app/services/agent/tools/kunlun_tool.py[411-433]
### Suggested fix
在 AnalysisAgent 标准化时，对输入 finding 做与 Orchestrator 类似的兼容：
1. `file_path = finding.get('file_path') or finding.get('file') or parse_location(finding.get('location'))`
2. 仅在上述推导后仍为空时才跳过。
3. 同时将推导后的 `file_path` 写入 standardized finding，保证后续 Verification/Save 都能使用。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

7. 垃圾路径仍可覆盖 🐞 Bug ≡ Correctness ⭐ New

Description

Orchestrator 合并逻辑仍会把非空的 garbage file_path(如 "?") 当作“有意义值”写回 merged，从而覆盖 existing 的真实路径；在
project_root 缺失导致无法校验路径时，这个路径会通过 normalize/merge 链路。该问题会再次造成前端定位文件错误。

Code

backend/app/services/agent/agents/orchestrator.py[R971-992]

+                            elif same_type and same_line and not same_file:
+                                # 🔥 FIX: Only allow cross-file matching when:
+                                # 1. new_file is garbage ("?"/empty) and descriptions still match
+                                # 2. One path is a prefix of the other ("src/foo.py" vs "foo.py")
+                                # Do NOT merge when existing_file is garbage - that would lose the real path.
+                                new_is_garbage = not new_file or new_file == "?"
+                                prefix_match = (
+                                    new_file.endswith("/" + existing_file) or
+                                    existing_file.endswith("/" + new_file)
+                                )
+                                if (new_is_garbage and similar_desc) or prefix_match:
+                                    match_found = True
+                                    logger.info(f"[Orchestrator] Matched by type+line despite file mismatch: {new_file} vs {existing_file}")
+                                else:
+                                    match_found = False
+                            else:
+                                match_found = False
+
+                            if match_found:
                                # Update existing with new info (e.g. verification results)
                                # 🔥 FIX: Smart merge - don't overwrite good data with empty values
                                merged = dict(existing_f)  # Start with existing data

Evidence

匹配分支显式将 new_file=="?" 视为 garbage 并允许在特定条件下 match_found=True；但后续 merge 对所有非空字符串都覆盖写入，未把 "?"
当作垃圾值排除。与此同时，当 runtime_context 中 project_root 为空时，_validate_file_path 会直接返回 True（无法验证时放行），使 "?"
这类路径不会被 normalize 阶段过滤，进而进入 merge 覆盖逻辑。

backend/app/services/agent/agents/orchestrator.py[971-996]
backend/app/services/agent/agents/orchestrator.py[1118-1121]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Even with the new matching criteria, the merge step still overwrites `file_path` with any non-empty value. Garbage values like `"?"` are non-empty and can replace a previously correct path.

### Issue Context
- `new_is_garbage = ... or new_file == "?"` exists, but is not used to protect the merge assignment.
- If `project_root` is missing, `_validate_file_path()` returns True, so garbage paths can pass normalization and reach the merge.

### Fix Focus Areas
- backend/app/services/agent/agents/orchestrator.py[971-1013]
- backend/app/services/agent/agents/orchestrator.py[1104-1121]

### Suggested change
- In the merge loop, treat `file_path` values in `{ "?", "", None }` as non-meaningful (do not overwrite an existing non-garbage `file_path`).
- Optionally also enforce the comment: *do not merge when `existing_file` is garbage* by adding an explicit guard.
- Consider normalizing `file_path == "?"` to empty string earlier (normalize stage) to reduce downstream risk.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

8. ~~Over-broad cross-file merge~~ ☑ 🐞 Bug ≡ Correctness

Description

Orchestrator 去重合并在 new_file 为空或 "?" 时，仅凭 same_type && same_line 就允许跨文件匹配并合并，这可能把不同文件但行号/类型相同的
finding 错误合并，导致验证结果或描述被串写。该行为改变了原本以同文件为主的匹配策略。

Code

backend/app/services/agent/agents/orchestrator.py[R971-987]

+                            elif same_type and same_line and not same_file:
+                                # 🔥 FIX: Only allow cross-file matching when:
+                                # 1. new_file is garbage ("?"/empty) - verification returned bad path
+                                # 2. One path is a prefix of the other ("src/foo.py" vs "foo.py")
+                                # Do NOT merge when existing_file is garbage - that would lose the real path.
+                                new_is_garbage = not new_file or new_file == "?"
+                                prefix_match = (
+                                    new_file.endswith("/" + existing_file) or
+                                    existing_file.endswith("/" + new_file)
+                                )
+                                if new_is_garbage or prefix_match:
+                                    match_found = True
+                                    logger.info(f"[Orchestrator] Matched by type+line despite file mismatch: {new_file} vs {existing_file}")
+                                else:
+                                    match_found = False
+                            else:
+                                match_found = False

Evidence

新逻辑在 same_type and same_line and not same_file 分支中，当 new_file 为空或 "?" 时直接 `match_found =
True，未要求 similar_desc` 或任何路径关系；因此如果某个 Agent/验证步骤产出 garbage path（空/"?"），就可能与任意文件中同 type+line 的
finding 合并。

backend/app/services/agent/agents/orchestrator.py[968-1017]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
跨文件合并在 `new_file` 为空/"?" 时过于宽松，可能错误合并无关 findings。
### Issue Context
此逻辑用于避免 verification 返回垃圾路径覆盖有效路径，但当前实现会在不同文件间仅凭 type+line 合并。
### Fix Focus Areas
- backend/app/services/agent/agents/orchestrator.py[971-987]
### What to change
1. 当 `new_is_garbage` 时，增加额外约束后才允许合并，例如：
- 要求 `similar_desc` 为真；或
- 要求 existing_file 非空且 new finding 的其它字段能证明同一处（例如同一 fingerprint / 同一 title 高相似度）。
2. 保留 `prefix_match` 的路径前缀合并逻辑（该逻辑更可解释）。
3. 确保合并时永不使用空/"?" 覆盖 existing 的有效 `file_path`（当前 smart merge 基本满足，但建议加显式保护）。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

9. ~~Non-string location breaks strip~~ ☑ 🐞 Bug ☼ Reliability

Description

Analysis Agent 标准化时对 file_path/location 直接调用 .strip()/.split()，未校验字段类型。只要 LLM/上游工具返回非字符串的
location/file_path（如 dict/list），就会触发 AttributeError 并导致整个 Analysis Agent 失败。

Code

backend/app/services/agent/agents/analysis.py[R769-776]

+                # 🔥 v2.2: file_path 必填校验 - 没有 file_path 的 finding 直接拒绝
+                # 优先从 file_path 获取，fallback 到 file / location
+                file_path = finding.get("file_path") or finding.get("file") or ""
+                if not file_path.strip() and finding.get("location"):
+                    loc = finding.get("location", "")
+                    file_path = loc.split(":")[0] if ":" in loc else loc
+                if not file_path.strip():
+                    skipped_no_filepath += 1

Evidence

代码仅校验 finding 是 dict，但未保证各字段类型；随后对 file_path 调用 .strip()，并可能将 location 赋给 file_path（非字符串时进一步导致
.strip() 崩溃）。该异常不在 per-finding 级别捕获，会中断整个 run。

backend/app/services/agent/agents/analysis.py[763-781]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`AnalysisAgent` assumes `file_path`/`location` are strings and calls `.strip()` / `.split()` on them. Non-string values can crash the entire agent run.
## Issue Context
Findings are only validated as dicts; individual fields are not type-validated.
## Fix Focus Areas
- backend/app/services/agent/agents/analysis.py[769-781]
## Suggested change
- Normalize types defensively:
- `raw_fp = finding.get("file_path") or finding.get("file")`
- `file_path = raw_fp if isinstance(raw_fp, str) else ""`
- `loc = finding.get("location")`
- `if not file_path and isinstance(loc, str): file_path = loc.split(":", 1)[0]`
- Keep the skip logic, but ensure it cannot throw.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (2)

10. Verdict index not migrated 🐞 Bug ➹ Performance

Description

模型把 AgentFinding.verdict 声明为 index=True，但本次 Alembic migration 只 add_column，没有创建索引。随着 agent_findings
增长，按 verdict 过滤/统计会产生不必要的全表扫描。

Code

backend/alembic/versions/009_add_verdict_to_agent_findings.py[R19-21]
+def upgrade() -> None:
+    op.add_column('agent_findings', sa.Column('verdict', sa.String(length=30), nullable=True))
+

Evidence
SQLAlchemy model 明确要求 verdict 建索引，但 migration 未包含 op.create_index，对应索引不会在已有环境中自动出现，造成 schema 与 ORM
意图不一致。
backend/app/models/agent_task.py[355-360]
backend/alembic/versions/009_add_verdict_to_agent_findings.py[19-24]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The ORM declares an index on `agent_findings.verdict`, but the Alembic migration only adds the column.
## Issue Context
Index=True in SQLAlchemy does not automatically create indexes in existing DBs; migrations must create them.
## Fix Focus Areas
- backend/alembic/versions/009_add_verdict_to_agent_findings.py[19-24]
- backend/app/models/agent_task.py[355-360]
## Suggested change
- In `upgrade()`, add `op.create_index('ix_agent_findings_verdict', 'agent_findings', ['verdict'])`.
- In `downgrade()`, drop the index before dropping the column.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

11. ~~Verdict not in response~~ ☑ 🐞 Bug ⚙ Maintainability

Description

已将 verdict 写入 AgentFinding，但 GET /{task_id}/findings 的响应模型未包含 verdict 字段，导致客户端无法读取该信息。

Code

backend/app/models/agent_task.py[358]

+    verdict = Column(String(30), nullable=True, index=True)  # confirmed / likely / uncertain / false_positive

Evidence
保存逻辑会持久化 verdict 到 AgentFinding；但 AgentFindingResponse 的字段列表不含 verdict，Pydantic from_attributes
序列化时也不会输出该字段。
backend/app/api/v1/endpoints/agent_tasks.py[178-205]
backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
backend/app/models/agent_task.py[355-360]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
verdict 已持久化到 `AgentFinding`，但对外查询 findings 的响应 schema 未返回该字段，导致新字段难以被前端/调用方使用。
### Issue Context
API 使用 `AgentFindingResponse`（Pydantic, `from_attributes=True`）作为 GET findings 的响应模型，当前未声明 verdict。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[178-205]
- backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
### Suggested fix
1. 在 `AgentFindingResponse` 中增加：`verdict: Optional[str] = None`。
2. 如前端需要基于 verdict 展示/过滤，可同步更新相关接口文档/前端字段映射。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Previous review results

Review updated until commit e56dcfd

Results up to commit N/A

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~Stats block NameError~~ ☑ 🐞 Bug ≡ Correctness

Description

任务完成后的统计逻辑里使用了 files_with_findings_set 但未定义，会在任务即将完成时抛出 NameError 并导致任务标记失败。同时 filtered_findings
从未被填充，导致严重度/verified/security_score 统计全部基于空列表计算。

Code

backend/app/api/v1/endpoints/agent_tasks.py[R570-604]

+                # 🔥 FIX: 先过滤 findings，再用过滤后的列表做统计
+                # 与 _save_findings 的过滤逻辑保持一致（排除无 file_path 的 finding）
+                filtered_findings = []
               for f in findings:
                   if isinstance(f, dict):
-                        file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
+                        raw_file_path = f.get("file_path") or f.get("file")
+                        file_path = raw_file_path if isinstance(raw_file_path, str) else ""
+                        if not file_path:
+                            raw_location = f.get("location", "")
+                            location = raw_location if isinstance(raw_location, str) else ""
+                            file_path = location.split(":")[0]
                       if file_path:
                           files_with_findings_set.add(file_path)
               task.files_with_findings = len(files_with_findings_set)

-                # 统计严重程度和验证状态
+                # 统计严重程度和验证状态（使用过滤后的列表）
               verified_count = 0
-                for f in findings:
-                    if isinstance(f, dict):
-                        sev = str(f.get("severity", "low")).lower()
-                        if sev == "critical":
-                            task.critical_count += 1
-                        elif sev == "high":
-                            task.high_count += 1
-                        elif sev == "medium":
-                            task.medium_count += 1
-                        elif sev == "low":
-                            task.low_count += 1
-                        # 🔥 统计已验证的发现
-                        if f.get("is_verified") or f.get("verdict") == "confirmed":
-                            verified_count += 1
+                for f in filtered_findings:
+                    sev = str(f.get("severity", "low")).lower()
+                    if sev == "critical":
+                        task.critical_count += 1
+                    elif sev == "high":
+                        task.high_count += 1
+                    elif sev == "medium":
+                        task.medium_count += 1
+                    elif sev == "low":
+                        task.low_count += 1
+                    # 🔥 统计已验证的发现
+                    if f.get("is_verified") or f.get("verdict") == "confirmed":
+                        verified_count += 1
               task.verified_count = verified_count
               
-                # 计算安全评分
-                task.security_score = _calculate_security_score(findings)
-                task.quality_score = _calculate_security_score(findings)
+                # 计算安全评分（使用过滤后的列表）
+                task.security_score = _calculate_security_score(filtered_findings)
+                task.quality_score = _calculate_security_score(filtered_findings)

Evidence

在任务完成分支中，代码声明了 filtered_findings = []，但随后的过滤循环只向 files_with_findings_set.add(file_path)
写入；该变量在当前作用域没有任何初始化。同时后续统计循环 for f in filtered_findings: 以及
_calculate_security_score(filtered_findings) 都使用了从未 append 的空列表，导致 verified_count/severity
分布/score 计算结果不正确（甚至在 NameError 前就已逻辑错误）。

backend/app/api/v1/endpoints/agent_tasks.py[570-605]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
在 `agent_tasks.py` 的任务完成统计逻辑中：
- `files_with_findings_set` 被使用但未初始化，运行时会直接 `NameError`。
- `filtered_findings` 声明后未被 `append`，导致后续严重度/verified/security_score 统计全部基于空列表。
### Issue Context
该代码位于任务完成后统计阶段，异常会让任务在“已保存 findings”后仍被标记为 FAILED/异常回滚统计字段，影响前端展示与任务状态。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[570-605]
### Suggested fix
1) 在进入循环前初始化：`files_with_findings_set = set()`。
2) 在过滤循环中，当 `file_path` 有效时同时：
 - `files_with_findings_set.add(file_path)`
 - `filtered_findings.append(f)`
3) （可选但更一致）将过滤条件与 `_save_findings` 保持一致（例如 `file_path.strip()` 校验），避免统计与入库数量不一致。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Stats loop type crash~~ ☑ 🐞 Bug ☼ Reliability

Description

agent_tasks 在任务完成时统计 findings 使用 fp.strip() 和 loc.split()，若 file_path/file/location
为非字符串会抛异常，导致本应 COMPLETED 的任务在统计阶段被外层 except 捕获并标记 FAILED。Orchestrator 的 _normalize_finding
会保留非字符串的 location（仅在 isinstance(location, str) 时解析），因此该输入在真实链路中可达。

Code

backend/app/api/v1/endpoints/agent_tasks.py[R573-587]

             for f in findings:
-                    if isinstance(f, dict):
-                        file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
-                        if file_path:
-                            files_with_findings_set.add(file_path)
+                    if not isinstance(f, dict):
+                        continue
+                    fp = f.get("file_path") or f.get("file") or ""
+                    if not fp.strip() and f.get("location"):
+                        loc = f.get("location", "")
+                        fp = loc.split(":")[0] if ":" in loc else loc
+                    if fp and fp.strip():
+                        filtered_findings.append(f)
+
+                files_with_findings_set = set()
+                for f in filtered_findings:
+                    file_path = f.get("file_path") or f.get("file") or f.get("location", "").split(":")[0]
+                    if file_path:
+                        files_with_findings_set.add(file_path)

Evidence

统计代码直接对 fp 调用 .strip()、对 loc 调用 .split()，但 fp/loc 并未保证为字符串；同时 Orchestrator 标准化仅在
location 为字符串时才解析成 file_path，否则会原样保留，从而可能把非字符串 location 传递到该统计逻辑，引发 AttributeError。该统计块位于任务完成大
try 内，异常会落入外层 except Exception，进而更新任务状态为 FAILED。

backend/app/api/v1/endpoints/agent_tasks.py[565-662]
backend/app/api/v1/endpoints/agent_tasks.py[570-588]
backend/app/services/agent/agents/orchestrator.py[1137-1159]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`backend/app/api/v1/endpoints/agent_tasks.py` 在完成任务统计阶段对 `file_path`/`file`/`location` 做字符串操作（`.strip()` / `.split(':')`），但这些字段可能来自 LLM/子 Agent 输出，类型不一定是 `str`。当出现 `dict/list` 等类型时会触发 `AttributeError`，从而让任务在“统计”环节失败并被标记为 FAILED。
### Issue Context
Orchestrator 的 `_normalize_finding` 只在 `location` 为字符串时才解析，否则会保留原值；因此非字符串 `location` 可以到达 agent_tasks 的统计逻辑。
### Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[570-588]
### What to change
1. 提取一个小函数（或内联）做安全的 file_path 解析：
- `fp = f.get('file_path')` / `f.get('file')` 只有在 `isinstance(x, str)` 时才使用，否则当作空字符串。
- `location` 仅在 `isinstance(loc, str)` 时才做 `':' in loc` / `loc.split(':', 1)`。
2. 第二个循环不要再次对 `f.get('location', '').split(...)` 做无类型保护的 split；直接使用前面解析出来的 `fp`（必要时将 `fp` 写回 `f['file_path']` 或在 `filtered_findings` 中存 tuple）。
3. 单元/集成层面：构造 `finding={'location': {'file':'a.py'}}` / `{'file_path': {'x':1}}` 确保不会导致任务失败。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Normalized None causes crash~~ ☑ 🐞 Bug ☼ Reliability

Description

Orchestrator 在合并 findings 时先调用 _normalize_finding(new_f)，但不检查其返回值就对 normalized_new 调用 .get()。由于本 PR
放宽 high_risk_areas 的“像文件路径”判断，更容易把不存在的路径写入 file_path，触发 _normalize_finding 返回 None 并导致任务直接崩溃。

Code

backend/app/services/agent/agents/orchestrator.py[R881-886]

+                                # 🔥 FIX: 放宽文件路径校验 - 不再限制扩展名，只要像文件路径就提取
                           if ("." in potential_file and
                               " " not in potential_file and
                               len(potential_file) < 100 and
-                                    any(potential_file.endswith(ext) for ext in ['.py', '.js', '.ts', '.java', '.go', '.php', '.rb', '.c', '.cpp', '.h'])):
+                                    not potential_file.endswith("/")):
                               file_path = potential_file

Evidence
本 PR 放宽了 high_risk_areas 的文件路径提取条件（不再限制扩展名），会把更多字符串写入 file_path；而 _normalize_finding 在 file_path
不存在时会 return None。合并逻辑未对 None 做保护，随后的 normalized_new.get(...) 会抛 AttributeError 终止整个 orchestrator。
backend/app/services/agent/agents/orchestrator.py[876-886]
backend/app/services/agent/agents/orchestrator.py[932-941]
backend/app/services/agent/agents/orchestrator.py[1223-1231]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Orchestrator merge loop assumes `_normalize_finding()` always returns a dict, but `_normalize_finding()` can return `None` when file_path validation fails. The merge loop then calls `.get()` on `None`, crashing the entire task.
## Issue Context
This PR relaxes the file-path heuristic for `high_risk_areas`, increasing the chance of extracting a non-existent path and hitting the `return None` branch.
## Fix Focus Areas
- backend/app/services/agent/agents/orchestrator.py[932-941]
- backend/app/services/agent/agents/orchestrator.py[1223-1231]
- backend/app/services/agent/agents/orchestrator.py[876-886]
## Suggested change
- After `normalized_new = self._normalize_finding(new_f)`, add:
- `if not normalized_new: continue`
- Optionally: validate existence before setting `file_path` in the `high_risk_areas` conversion path, or set `file_path` only after `_validate_file_path()` passes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (3)

4. ~~Stats use unsaved findings~~ ☑ 🐞 Bug ≡ Correctness

Description

_save_findings 现在会直接跳过无 file_path 的 finding，但任务完成时的严重程度计数、verified_count 以及 security_score 仍基于原始
findings 列表计算。这样会出现 findings_count（已保存数量）与各统计项口径不一致，导致前端/报告展示错误。

Code

backend/app/api/v1/endpoints/agent_tasks.py[R1271-1277]

+            # 🔥 v2.2: file_path 为空直接跳过
+            if not file_path or not file_path.strip():
+                logger.warning(
+                    f"[SaveFindings] 🚫 跳过无 file_path 的 finding: "
+                    f"title={finding.get('title', 'N/A')[:50]}, type={finding.get('vulnerability_type', '?')}"
+                )
+                continue

Evidence
_save_findings 新增“无 file_path 直接 continue”过滤后，saved_count 会减少并写入 task.findings_count；但同一执行流程后续仍用未过滤的
findings 计算 files_with_findings、severity 分布、verified_count 和 security_score，导致统计包含未入库的数据。
backend/app/api/v1/endpoints/agent_tasks.py[1271-1293]
backend/app/api/v1/endpoints/agent_tasks.py[522-547]
backend/app/api/v1/endpoints/agent_tasks.py[565-599]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Task statistics are computed from the original `findings` list even though `_save_findings()` filters/skips some findings (now including empty `file_path`). This makes task counters and scores inconsistent with what is actually stored in DB.
## Issue Context
`task.findings_count` is set from `saved_count`, but other counters (severity buckets, verified_count, security_score, files_with_findings) are still derived from the unfiltered `findings`.
## Fix Focus Areas
- backend/app/api/v1/endpoints/agent_tasks.py[522-547]
- backend/app/api/v1/endpoints/agent_tasks.py[565-599]
- backend/app/api/v1/endpoints/agent_tasks.py[1157-1426]
## Suggested change
Choose one:
1) Filter once, then use the filtered list for both saving and stats.
- Extract a helper like `_filter_valid_findings(findings, project_root)` that mirrors `_save_findings` filtering rules.
2) After commit, query DB for `AgentFinding` rows for the task and compute stats from DB rows (authoritative).
Ensure `files_with_findings`, severity counts, verified_count, and security_score are based on the same set as `saved_count`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. Verdict migration missing 🐞 Bug ≡ Correctness

Description

AgentFinding 新增了 verdict 字段且保存时会写入该列，但 Alembic 迁移里 agent_findings 表定义不包含
verdict，运行时插入会失败并触发回滚，导致任务可能“完成”但 findings 实际未入库。

Code

backend/app/api/v1/endpoints/agent_tasks.py[R1390-1394]

        code_snippet=code_snippet[:10000] if code_snippet else None,
        suggestion=suggestion[:5000] if suggestion else None,
        is_verified=is_verified,
+                verdict=verdict,  # 🔥 新增：保存 verdict 到数据库
        ai_confidence=confidence,  # 🔥 FIX: Use ai_confidence, not confidence

Evidence
当前代码在创建 AgentFinding ORM 实例时传入 verdict；但现有创建 agent_findings 的 Alembic 迁移不含 verdict 列（且 migrations
中仅该文件创建该表），因此数据库未升级时 INSERT/COMMIT 会报“column verdict does not exist”并回滚。
backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
backend/app/models/agent_task.py[355-363]
backend/alembic/versions/006_add_agent_tables.py[146-183]
backend/alembic/versions/006_add_agent_tables.py[225-232]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`AgentFinding.verdict` 已写入 ORM 并在保存 findings 时赋值，但数据库迁移未添加该列，导致写库失败并回滚。
### Issue Context
- 现有 `agent_findings` 表由 Alembic revision `006_add_agent_tables` 创建，未包含 `verdict`。
- `_save_findings` 构建 `AgentFinding(..., verdict=verdict)` 会在 DB schema 未升级时触发插入失败。
### Fix Focus Areas
- backend/alembic/versions/006_add_agent_tables.py[146-232]
- backend/app/models/agent_task.py[355-363]
- backend/app/api/v1/endpoints/agent_tasks.py[1345-1396]
### Suggested fix
1. 新增一个 Alembic revision：
- `op.add_column('agent_findings', sa.Column('verdict', sa.String(length=30), nullable=True))`
- `op.create_index('ix_agent_findings_verdict', 'agent_findings', ['verdict'])`（如需要）
2. downgrade 中对称 `drop_index/drop_column`。
3.（可选）若写库失败不应“静默成功”，考虑在 `_save_findings` commit 失败时向上抛错或将 `saved_count` 置 0，避免任务统计误导。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

6. ~~Location-only findings dropped~~ ☑ 🐞 Bug ≡ Correctness

Description

AnalysisAgent 标准化阶段强制要求 finding 必须包含非空 file_path，导致仅提供 location/file 的 findings 被直接丢弃，即使系统其他模块已支持从
location/file 推导 file_path。

Code

backend/app/services/agent/agents/analysis.py[R769-777]

+                # 🔥 v2.2: file_path 必填校验 - 没有 file_path 的 finding 直接拒绝
+                file_path = finding.get("file_path", "") or ""
+                if not file_path.strip():
+                    skipped_no_filepath += 1
+                    logger.warning(
+                        f"[Analysis] 🚫 跳过无 file_path 的 finding: "
+                        f"title={finding.get('title', '?')[:50]}, type={finding.get('vulnerability_type', '?')}"
+                    )
+                    continue

Evidence
AnalysisAgent 只检查 finding['file_path']，缺失则 continue；但 Kunlun 工具输出使用 location 字段而非
file_path，Orchestrator 也实现了 location/file -> file_path 的标准化逻辑，且 _save_findings 同样支持从
location/file 回退生成 file_path。当前 AnalysisAgent 的硬校验会让这类结果在进入后续链路前就被丢弃。
backend/app/services/agent/agents/analysis.py[759-800]
backend/app/services/agent/tools/kunlun_tool.py[411-433]
backend/app/services/agent/agents/orchestrator.py[1135-1161]
backend/app/api/v1/endpoints/agent_tasks.py[1263-1278]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
AnalysisAgent 在标准化 findings 时仅接受非空 `file_path`，会丢弃仅提供 `location`/`file` 的 findings；但仓库内多个组件/工具确实会产出 `location` 字段。
### Issue Context
- Kunlun 工具解析输出时产出 `location`。
- Orchestrator 已实现 `location/file -> file_path` 的标准化。
- `_save_findings` 也支持从 `file`/`location` 回退提取 `file_path`。
### Fix Focus Areas
- backend/app/services/agent/agents/analysis.py[759-792]
- backend/app/services/agent/agents/orchestrator.py[1135-1161]
- backend/app/services/agent/tools/kunlun_tool.py[411-433]
### Suggested fix
在 AnalysisAgent 标准化时，对输入 finding 做与 Orchestrator 类似的兼容：
1. `file_path = finding.get('file_path') or finding.get('file') or parse_location(finding.get('location'))`
2. 仅在上述推导后仍为空时才跳过。
3. 同时将推导后的 `file_path` 写入 standardized finding，保证后续 Verification/Save 都能使用。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

7. ~~Over-broad cross-file merge~~ ☑ 🐞 Bug ≡ Correctness

Description

Orchestrator 去重合并在 new_file 为空或 "?" 时，仅凭 same_type && same_line 就允许跨文件匹配并合并，这可能把不同文件但行号/类型相同的
finding 错误合并，导致验证结果或描述被串写。该行为改变了原本以同文件为主的匹配策略。

Code

backend/app/services/agent/agents/orchestrator.py[R971-987]

+                            elif same_type and same_line and not same_file:
+                                # 🔥 FIX: Only allow cross-file matching when:
+                                # 1. new_file is garbage ("?"/empty) - verification returned bad path
+                                # 2. One path is a prefix of the other ("src/foo.py" vs "foo.py")
+                                # Do NOT merge when existing_file is garbage - that would lose the real path.
+                                new_is_garbage = not new_file or new_file == "?"
+                                prefix_match = (
+                                    new_file.endswith("/" + existing_file) or
+                                    existing_file.endswith("/" + new_file)
+                                )
+                                if new_is_garbage or prefix_match:
+                                    match_found = True
+                                    logger.info(f"[Orchestrator] Matched by type+line despite file mismatch: {new_file} vs {existing_file}")
+                                else:
+                                    match_found = False
+                            else:
+                                match_found = False

Evidence

新逻辑在 same_type and same_line and not same_file 分支中，当 new_file 为空或 "?" 时直接 `match_found =
True，未要求 similar_desc` 或任何路径关系；因此如果某个 Agent/验证步骤产出 garbage path（空/"?"），就可能与任意文件中同 type+line 的
finding 合并。

backend/app/services/agent/agents/orchestrator.py[968-1017]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below...

qodo-free-for-open-source-projects · 2026-05-04T15:00:06Z

                code_snippet=code_snippet[:10000] if code_snippet else None,
                suggestion=suggestion[:5000] if suggestion else None,
                is_verified=is_verified,
+                verdict=verdict,  # 🔥 新增：保存 verdict 到数据库
                ai_confidence=confidence,  # 🔥 FIX: Use ai_confidence, not confidence


1. Verdict migration missing 🐞 Bug ≡ Correctness

AgentFinding 新增了 verdict 字段且保存时会写入该列，但 Alembic 迁移里 agent_findings 表定义不包含 verdict，运行时插入会失败并触发回滚，导致任务可能“完成”但 findings 实际未入库。

Agent Prompt

### Issue description `AgentFinding.verdict` 已写入 ORM 并在保存 findings 时赋值，但数据库迁移未添加该列，导致写库失败并回滚。 ### Issue Context - 现有 `agent_findings` 表由 Alembic revision `006_add_agent_tables` 创建，未包含 `verdict`。 - `_save_findings` 构建 `AgentFinding(..., verdict=verdict)` 会在 DB schema 未升级时触发插入失败。 ### Fix Focus Areas - backend/alembic/versions/006_add_agent_tables.py[146-232] - backend/app/models/agent_task.py[355-363] - backend/app/api/v1/endpoints/agent_tasks.py[1345-1396] ### Suggested fix 1. 新增一个 Alembic revision： - `op.add_column('agent_findings', sa.Column('verdict', sa.String(length=30), nullable=True))` - `op.create_index('ix_agent_findings_verdict', 'agent_findings', ['verdict'])`（如需要） 2. downgrade 中对称 `drop_index/drop_column`。 3.（可选）若写库失败不应“静默成功”，考虑在 `_save_findings` commit 失败时向上抛错或将 `saved_count` 置 0，避免任务统计误导。

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

1. Add Alembic migration for verdict column in agent_findings table 2. Analysis agent: fallback to location/file fields when file_path is empty 3. API response: expose verdict field in AgentFindingResponse

Windelly · 2026-05-05T00:23:33Z

@CodiumAI-Agent review

Windelly · 2026-05-05T01:05:39Z

Closing and reopening to trigger Qodo re-review after addressing feedback.

qodo-free-for-open-source-projects · 2026-05-05T01:06:09Z

Persistent review updated to latest commit 5b9d3e0

…ype safety, verdict index

Windelly · 2026-05-05T01:26:16Z

Round 2 fixes pushed. Reopening for Qodo re-review.

qodo-free-for-open-source-projects · 2026-05-05T01:26:51Z

Persistent review updated to latest commit f3d0b74

Windelly · 2026-05-05T01:44:15Z

Round 3 fixes pushed. Reopening for Qodo.

qodo-free-for-open-source-projects · 2026-05-05T01:44:46Z

Persistent review updated to latest commit 9c77394

Windelly · 2026-05-05T01:52:05Z

Fixed NameError. Reopening for Qodo.

qodo-free-for-open-source-projects · 2026-05-05T01:52:34Z

Persistent review updated to latest commit e56dcfd

qodo-free-for-open-source-projects Bot reviewed May 4, 2026

View reviewed changes

fix: address Qodo review feedback for PR lintsinghua#217

e10fb57

1. Add Alembic migration for verdict column in agent_findings table 2. Analysis agent: fallback to location/file fields when file_path is empty 3. API response: expose verdict field in AgentFindingResponse

chore: trigger Qodo re-review

5b9d3e0

Windelly closed this May 5, 2026

Windelly reopened this May 5, 2026

qodo-free-for-open-source-projects Bot reviewed May 5, 2026

View reviewed changes

Comment thread backend/app/services/agent/agents/orchestrator.py

Comment thread backend/app/api/v1/endpoints/agent_tasks.py

fix: address Qodo review - normalize None guard, stats consistency, t…

f3d0b74

…ype safety, verdict index

Windelly closed this May 5, 2026

Windelly reopened this May 5, 2026

qodo-free-for-open-source-projects Bot reviewed May 5, 2026

View reviewed changes

Comment thread backend/app/api/v1/endpoints/agent_tasks.py Outdated

fix: Qodo round 3 - type-safe stats, guarded cross-file merge

9c77394

Windelly closed this May 5, 2026

Windelly reopened this May 5, 2026

qodo-free-for-open-source-projects Bot reviewed May 5, 2026

View reviewed changes

Comment thread backend/app/api/v1/endpoints/agent_tasks.py

fix: define files_with_findings_set and populate filtered_findings

e56dcfd

Windelly closed this May 5, 2026

Windelly reopened this May 5, 2026

Uh oh!

Conversation

Windelly commented May 4, 2026

根因分析

Bug 1：三元表达式运算符优先级错误

Bug 2：无 file_path 的 findings 直接入库

Bug 3：orchestrator 文件扩展名白名单过严

Bug 4：merge 逻辑未考虑 garbage path

修复内容

验证

Uh oh!

vercel Bot commented May 4, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Summary by Qodo

(Agentic_describe updated until commit e56dcfd)

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Previous review results

Uh oh!

qodo-free-for-open-source-projects Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Windelly commented May 5, 2026

Uh oh!

Windelly commented May 5, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Windelly commented May 5, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Windelly commented May 5, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Windelly commented May 5, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-free-for-open-source-projects Bot commented May 4, 2026 •

edited

Loading

(Agentic_describe updated until commit `e56dcfd`)

qodo-free-for-open-source-projects Bot commented May 4, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented May 5, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented May 5, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented May 5, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented May 5, 2026 •

edited

Loading