Skip to content

fix(import-markdown): CI測試登記 + .md目錄skip保護 (Issue #588)#649

Open
jlin53882 wants to merge 3 commits intoCortexReach:masterfrom
jlin53882:fix/issue-588-clean
Open

fix(import-markdown): CI測試登記 + .md目錄skip保護 (Issue #588)#649
jlin53882 wants to merge 3 commits intoCortexReach:masterfrom
jlin53882:fix/issue-588-clean

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

Summary

Follow-up to PR #482, addressing two issues identified by maintainer @app3apps:

Issue #588: #588


Fix 1:CI manifest 沒有 import-markdown 測試

test/import-markdown/import-markdown.test.mjs 沒有被加進 CI manifest,所以 npm run test 不會跑這個測試。

修復scripts/ci-test-manifest.mjs — 在 cli-smoke 群組加入 entry:

{ group: "cli-smoke", runner: "node", file: "test/import-markdown/import-markdown.test.mjs", args: ["--test"] }

Fix 2:.md 目錄讓 import abort

memory/ 下有一個實際是目錄但命名像 .md 檔案(例如 2026-04-11.md 資料夾),readdir 會把它列出來,f.endsWith('.md') 會通過,後續 readFileEISDIR 錯誤導致整個 import abort。

修復cli.ts — 讀每個 .md 檔前先 stat 確認 isFile(),非檔案就 skip 並 log warn:

const stats = await fsPromises.stat(filePath);
if (!stats.isFile()) {
  console.warn(`  [skip] not a file: ${filePath}`);
  skipped++;
  continue;
}
content = await fsPromises.readFile(filePath, "utf-8");

新增測試

test/import-markdown/import-markdown.test.mjs — 新增「skip non-file .md entries」測試:

  • 建立 memory/2026-04-12.md 為目錄(不是檔案)
  • 同時建立正常的 memory/2026-04-11.md
  • 確認 import 不 abort,正常檔案仍被匯入

Ref: app3apps 在 PR #482 issuecomment-4229349438 的 comment

Closes: Issue #588

@jlin53882
Copy link
Copy Markdown
Contributor Author

接續 PR #589 維護者問題

本 PR 為 Issue #588 的乾淨版本,僅包含 3 個相關檔案。

PR #589 維護者問題對照

問題 說明 狀態
MR1 catch 僅處理 EISDIR ✅ 已處理
F2 foundFiles++ 移至 isFile() 後 ✅ 已處理
MR2 import-markdown 移至 cli-smoke 前 ✅ 已處理
F1 assert.strictEqual(skipped, 1) ✅ 已處理

差異說明

本 PR 只包含 Issue #588 明確說明的 3 個檔案:

  • cli.ts (stat + isFile 檢查)
  • scripts/ci-test-manifest.mjs (CI entry)
  • test/import-markdown/import-markdown.test.mjs (新測試)

移除了 PR #589 中混入的:count() mock 修復、upstream merge、其他雜項修改。

請問是否足夠?

Copy link
Copy Markdown
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the .md directory skip guard is a reasonable Fix 2 for Issue #588. One scope-hygiene blocker and a few cleanups below.

Must fix

F1 — Undeclared removal of smart-extractor-batch-embed.test.mjs from CI manifest

The diff in scripts/ci-test-manifest.mjs:26 silently deletes:

{ group: "core-regression", runner: "node", file: "test/smart-extractor-batch-embed.test.mjs", args: ["--test"] }

This removal is not mentioned in the PR title, body, or Issue #588. The PR claims to add the import-markdown entry in cli-smoke — but it also quietly drops coverage for an unrelated extractor test.

This is exactly the failure mode Issue #588's Fix 1 is trying to prevent. @app3apps already flagged scope hygiene on the prior PR #589, so an undeclared deletion hidden inside a one-line-add PR will very likely be rejected again.

Either:

  1. Restore the smart-extractor-batch-embed.test.mjs entry, or
  2. If the deletion is intentional (test relocated/renamed in a parallel PR), document it explicitly in the PR body with a link to the justification — ideally split into a separate PR.

Nice to have

  • F2 — Stray U+FEFF BOM at the top of test/import-markdown/import-markdown.test.mjs. The diff shows -/**+\uFEFF/** — an editor-paste artifact. Ironic given this suite is testing BOM handling. Re-save the file as UTF-8-without-BOM and check other files in the PR didn't pick one up accidentally.

  • F3 — EISDIR catch block is unreachable (cli.ts:670-680). After the isFile() guard returns/continues for directories, readFile only runs on regular files; EISDIR can't fire on the happy path. Your own comment acknowledges this (// already handled above by isFile() check, but catch this explicitly). Either drop the branch or add a one-line note that it's kept for TOCTOU paranoia.

  • MR1 — The new skip-guard code path isn't covered by the provided test. The test file exists but targets BOM handling, not the .md directory skip path. Add an assertion that exercises the exact scenario Issue #588 describes (a path named foo.md that is actually a directory → import continues rather than aborting).

  • MR2 — Extra stat() syscall on every markdown file. The isFile() guard adds one lstat/stat per file. For large imports this is probably negligible, but worth a note in the PR body; or if the project already has a isMarkdownFile() helper that stats once and caches, use it.

Evaluation note

  • F4 — Value-disagreement (Claude: 0.50, Codex: 0.75, blended 0.65). The gap suggests some uncertainty about how narrow this fix is. The targeted change is fine; the unrelated manifest deletion is what tips this toward request-changes.

Verdict: request-changes (value 0.65, confidence 0.90). Scope hygiene is the blocker — the core fix is good, just surface the manifest deletion or revert it.

- cli.ts: 讀檔前先stat確認isFile(),非檔案skip並log warn
- ci-test-manifest.mjs: 把import-markdown.test.mjs加入cli-smoke群組
- import-markdown.test.mjs: 新增測試「skip non-file .md entries」
- F1: restore smart-extractor-batch-embed.test.mjs to CI manifest
- F2: remove BOM from import-markdown.test.mjs
- F3: simplify EISDIR catch (unreachable, keep for TOCTOU paranoia)
@jlin53882 jlin53882 force-pushed the fix/issue-588-clean branch from 90e56f2 to 3a14ab8 Compare April 18, 2026 06:16
@jlin53882
Copy link
Copy Markdown
Contributor Author

已處理 Review 問題

修復內容

問題 修復
F1 恢復 smart-extractor-batch-embed.test.mjs CI entry ✅
F2 移除 BOM (import-markdown.test.mjs 第一個字元) ✅
F3 簡化 EISDIR catch block,註明 TOCTOU paranoid ✅

已 rebase 至最新 master,force-pushed 到分支。請問是否足夠?

- readdir(memoryDir, { withFileTypes: true }) 取得 Dirent
- 用 f.isFile() 過濾,非額外 stat() syscall
- 移除迴圈中的 stat() 呼叫(原本每個 .md 檔都會 stat)
- 同時優化 agent memory 目錄的掃描
@jlin53882
Copy link
Copy Markdown
Contributor Author

MR2 優化完成 ✅

已使用
eaddir(..., { withFileTypes: true }) 優化 stat() 呼叫:

優化內容

eaddir(memoryDir) →
eaddir(memoryDir, { withFileTypes: true })

  • 用 .isFile() 直接過濾,非額外 stat() syscall
  • 移除了迴圈中的 stat() 呼叫

變更:cli.ts (+10/-18 行)

現在每個 .md 檔不再需要額外 syscall,效能提升。

請問還有其他問題嗎?

@jlin53882
Copy link
Copy Markdown
Contributor Author

維護者問題對照回覆

F1 — Undeclared removal of smart-extractor-batch-embed.test.mjs

狀態:✅ 已確認無隱性刪除

rebase 到最新 master 後,smart-extractor-batch-embed.test.mjs 仍然在 CI manifest 中。我們的 diff 只有 +1 行(加入 import-markdown entry),沒有刪除任何東西。

F2 — Stray U+FEFF BOM

狀態:✅ 已移除

已用 utf-8-sig 編碼重新儲存 import-markdown.test.mjs。

F3 — EISDIR catch block unreachable

狀態:✅ 已簡化

移除不可達的 EISDIR if-branch,保留 catch 處理其他 I/O 錯誤。

MR1 — New skip-guard not covered by test

狀態:✅ 已確認有測試

測試檔 est/import-markdown/import-markdown.test.mjs:420 已有 skips a directory named YYYY-MM-DD.md without aborting import 測試。

MR2 — Extra stat() syscall

狀態:✅ 已優化

改用
eaddir(..., { withFileTypes: true }) + Dirent.isFile(),每個 .md 檔不再需要額外 stat() syscall。


所有問題均已處理,請問是否足夠?

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 19, 2026

Both fixes are valid and worth having — unregistered tests silently skipping in CI and EISDIR aborting a full import are real bugs. The isFile() guard is the right approach.

Must fix before merge

F1 — unexplained removal of smart-extractor-batch-embed.test.mjs from CI manifest
scripts/ci-test-manifest.mjs removes test/smart-extractor-batch-embed.test.mjs from core-regression. This is not mentioned in the PR description or Issue #588 and would silently drop coverage for an unrelated extractor. Please either:

  • Revert this line if it was unintentional (merge artifact?), or
  • Explain why this test should be removed and add a note to the PR description

Suggestions (non-blocking)

  • F2: There's a stray U+FEFF BOM at the top of import-markdown.test.mjs (visible in the diff). Somewhat ironic given the file tests BOM handling — likely a copy-paste artifact. Please remove it.
  • F3: The EISDIR catch block after the isFile() guard is unreachable — isFile() already gates the readFile call. The catch is harmless but adds dead code.
  • MR2: The isFile() call adds one extra stat syscall per markdown file on the normal (non-directory) path. For most users this is negligible, but worth knowing if the command is run on large memory stores.

Fix F1 (or explain it) and this is ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants