fix(acp): auto-retry npx without --prefer-offline on cache failure#939
fix(acp): auto-retry npx without --prefer-offline on cache failure#939
Conversation
When users upgrade from pre-PR#911 versions, stale npm cache metadata causes ETARGET/ERESOLVE errors with --prefer-offline. Instead of requiring manual `npm cache clean --force`, implement a two-phase startup: try --prefer-offline first for fast startup, then retry without it to refresh the cache automatically. Also adds --yes flag to Claude npx invocation (matching CodeBuddy) to prevent interactive prompts when stdin is piped.
There was a problem hiding this comment.
Code Review
HIGH Issues
1. 新增 console.warn 违反 debug hygiene(且可能进入生产日志)
File: src/agent/acp/AcpConnection.ts:268-276
console.warn('[ACP] --prefer-offline failed, retrying with fresh registry lookup:', firstError instanceof Error ? firstError.message : String(firstError));Problem: 项目 ESLint 规则里 no-console 是 warn,并且 review 维度明确要求“新增 console.log/console.debug 要重点标记”。这里新增了 console.warn(同属 console 家族),会在正常用户环境中输出,且该路径在缓存异常/升级后是“常见可复现”的场景,容易造成生产日志噪音与隐私/环境信息泄露风险(错误信息可能包含路径、registry 细节等)。
Fix: 用项目统一 logger(若已有)替代,或至少 gated 到 ACP_PERF_LOG / debug flag。示例:
if (ACP_PERF_LOG) {
console.warn(
'[ACP] --prefer-offline failed, retrying with fresh registry lookup:',
firstError instanceof Error ? firstError.message : String(firstError)
);
}或改为内部 logger:logger.warn(...)(按项目现有日志体系落地)。
2. CodeBuddy 重试路径未清理 isDetached 可能导致后续 disconnect 走错分支
File: src/agent/acp/AcpConnection.ts:321-343
// Phase 1: Try with --prefer-offline for fast startup
try {
await this.spawnAndSetupCodebuddy(spawnCommand, cleanEnv, workingDir, isWindows, extraArgs, true);
} catch (firstError) {
// ...
this.child = null;
this.isSetupComplete = false;
this.isDetached = false;
await this.spawnAndSetupCodebuddy(spawnCommand, cleanEnv, workingDir, isWindows, extraArgs, false);
}Problem: 这里 catch 中把 this.child = null 直接覆盖,但没有确保“第一次 spawn 出来的进程树”被终止。更关键的是:spawnAndSetupCodebuddy 内部会设置 this.isDetached = !isWindows 并 unref();如果第一次 spawn 成功创建了 detached 进程但在 setup/initialize 阶段失败,catch 里把 this.child 置空后,外部再也无法通过 disconnect() 的逻辑去 kill 该进程组,可能遗留后台 CodeBuddy/npx 进程(资源泄漏、端口占用、重复实例)。
这在“缓存损坏导致 npx 失败/退出”场景下非常现实:进程可能已启动但很快退出/或卡在某个阶段,当前代码没有可靠的清理路径。
Fix: 在重试前显式清理第一次 child(如果存在且有 pid),而不是仅置空引用。建议抽一个“terminateChildIfAny()”并在两处重试共用:
private async terminateChildIfAny(): Promise<void> {
if (!this.child) return;
await this.disconnect(); // 复用现有跨平台 kill 逻辑
}
catch (firstError) {
if (ACP_PERF_LOG) { /* log */ }
await this.terminateChildIfAny();
// disconnect 已经会 reset state;如需保留部分状态再单独设置
await this.spawnAndSetupCodebuddy(..., false);
}Claude 的重试同理(虽然 Claude 非 detached,但仍可能遗留子进程/占用 stdin/stdout 句柄)。
MEDIUM Issues
3. spawnAndSetup* 参数列表过长,且 isWindows 可在方法内部计算
File: src/agent/acp/AcpConnection.ts
当前签名:
private async spawnAndSetupClaude(
spawnCommand: string,
cleanEnv: Record<string, string | undefined>,
workingDir: string,
isWindows: boolean,
preferOffline: boolean
): Promise<void>Description: isWindows 在多个调用点重复计算并传参,增加了调用复杂度,也更容易在未来重构时传错参数顺序(尤其是多个 boolean 参数)。同样问题也存在于 spawnAndSetupCodebuddy(...)(还额外带 extraArgs)。
Recommendation: 在方法内部计算 const isWindows = process.platform === 'win32';,并把参数改为对象参数以避免 boolean 位置错误:
private async spawnAndSetupClaude(opts: {
spawnCommand: string;
cleanEnv: Record<string, string | undefined>;
workingDir: string;
preferOffline: boolean;
}): Promise<void> { /* ... */ }Summary
| Level | Count |
|---|---|
| CRITICAL | 0 |
| HIGH | 2 |
| MEDIUM | 1 |
🤖 This review was generated by AI and may contain inaccuracies. Please focus on issues you agree with and feel free to disregard any that seem incorrect. Thank you for your contribution!
The retry catch blocks previously set this.child = null without killing the spawned process, which could leave orphaned npx/CLI processes if the first attempt failed after spawn (e.g. initialize timeout). Extract terminateChild() from disconnect() to deduplicate the platform-aware kill logic (Windows taskkill tree kill, POSIX detached process group kill, standard SIGTERM). Both disconnect() and retry paths now share this single method.
Review Verification对 Code Review 中的 3 个 issue 逐一验证。 Code Review Issues VerificationHIGH #1: 新增
|
| # | Severity | Issue | Verdict | Action |
|---|---|---|---|---|
| 1 | HIGH | console.warn debug hygiene | ❌ False Positive | 忽略 |
| 2 | HIGH | 重试路径未终止旧子进程 | ✅ Confirmed | 已修复 (0dedb18) |
| 3 | MEDIUM | 参数列表过长 | 后续优化 |
Summary
--prefer-offline快速启动,失败时自动去掉该 flag 重试,强制刷新 registry 缓存--yesflag:Claude npx 调用补齐--yes(与 CodeBuddy 对齐),防止 stdin piped 时交互提示挂起spawnAndSetupClaude/spawnAndSetupCodebuddy封装 spawn 逻辑,支持preferOffline参数控制背景
PR #911 将 Claude ACP bridge 从
@zed-industries/claude-code-acp迁移到@zed-industries/claude-agent-acp,commit 2d3f824 锁定版本为@0.18.0。升级用户因 npm 缓存中残留的旧 registry 元数据导致 ETARGET/ERESOLVE 错误,此前只能手动执行npm cache clean --force解决。行为变化
--prefer-offline快速启动--yes自动跳过Test plan
--prefer-offline模式正常工作~/.npm/_npx/缓存数据),确认自动重试成功