Skip to content

feat: add memory normalizer pipeline#222

Closed
Victor-dw wants to merge 1 commit intoCortexReach:masterfrom
Victor-dw:codex/memory-normalizer-pr-v2
Closed

feat: add memory normalizer pipeline#222
Victor-dw wants to merge 1 commit intoCortexReach:masterfrom
Victor-dw:codex/memory-normalizer-pr-v2

Conversation

@Victor-dw
Copy link
Copy Markdown

Summary

  • add a configurable memory normalizer pipeline with audit logging and rule/raw fallbacks
  • route memory_store and regex auto-capture writes through atomic metadata aware normalization
  • add governance/normalizer CLI surfaces plus targeted regression tests

What Changed

  • introduced src/normalizer.ts plus normalization rule/validation/types helpers
  • added atomic metadata utilities and governance reporting helpers used by CLI/tool outputs
  • updated index.ts to initialize the normalizer from plugin config and use it in auto-capture fallback
  • updated src/tools.ts so memory_store can accept atomic input or normalize free-form candidate text before persistence
  • extended openclaw.plugin.json with normalization settings
  • added tests for memory_store normalization, auto-capture normalization, and a live A/B harness

Validation

  • node test/memory-store-normalizer.mjs
  • node test/normalizer-auto-capture.mjs
  • node test/cli-smoke.mjs
  • node test/plugin-manifest-regression.mjs
  • node test/normalizer-phase1-live-ab.mjs (skips unless MEMORY_NORMALIZER_API_KEY is set)

Notes

  • this branch is rebased onto CortexReach/master and intentionally excludes the earlier local stable-merge history

Copy link
Copy Markdown
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

核心思路有价值——auto-capture 的原始文本经过清洗和结构化后确实能提升记忆质量。atomic metadata schema(版本化、类型安全)、三级 fallback(LLM → 规则 → 原始)、opt-in 设计都是正确的方向。

但当前 PR 不适合直接合并,有以下问题:

必须解决

  1. 未声明的 Breaking Change:PR 删除了 memory upgradereindex-fts CLI 命令(移除了 memory-upgrader.ts / llm-client.ts 依赖),但 PR 描述里没有提及。现有用户升级后会丢失这些功能。

  2. 个人化内容泄漏normalization-rules.tsderiveTagsFromText() 硬编码了 "lossless-claw""tushare""acpx-with-proxy""7897" 等个人项目名称,测试用例中也大量使用这些内容。这些对通用用户无意义,且 "proxy""7897" 这样的规则会对无关记忆产生误打标。

  3. 中英文混杂的 CLI 输出"还有 ${n} 条""命中噪声/会话元信息规则""包含 pending/计划态信号" 等中文字符串直接出现在 CLI 输出中,非中文用户无法理解。

建议改进

  1. PR 范围过大:+3397 行,同时引入 normalizer 管道、atomic metadata schema、governance 报告、search observation、review packet 对比、auto-capture 重构。应拆分为独立 PR,便于 review 和 rollback。

  2. isRuntimeChatter 正则过宽/\bstatus\b/i/\bpoll(?:ing)?\b/i 会把 "I prefer polling over webhooks" 这样的合法内容判为 chatter 并丢弃。

  3. 测试覆盖不均:PR 有 3 个测试文件(641 行),但主要覆盖集成路径。normalization-rules.ts(正则匹配、chatter 检测、标签推导)和 normalization-validate.ts(分支逻辑)这两个最容易出错的模块缺少独立的单元测试。

建议

建议将有价值的部分拆分为独立 PR 逐步提交:

  1. PR A:atomic metadata schema(src/atomic-memory.ts)+ auto-capture 重构(src/auto-capture.ts)— 这是最干净、最容易合并的部分
  2. PR B:normalizer 管道核心(src/normalizer.ts + src/normalization-*.ts)— 清理个人化内容后提交
  3. PR C:governance / audit CLI 工具 — 独立功能,单独 review

每个 PR 都不应删除现有 CLI 命令。建议关闭当前 PR,按上述拆分重新提交。

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Mar 16, 2026

关闭此 PR。核心思路有价值,建议按 review 中的拆分方案重新提交。

@rwmjhb rwmjhb closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants