Skip to content

Add persistent cache for git mirrors and npm artifacts#10

Open
SoonIter wants to merge 1 commit intomainfrom
codex/git-clone-perf-cache
Open

Add persistent cache for git mirrors and npm artifacts#10
SoonIter wants to merge 1 commit intomainfrom
codex/git-clone-perf-cache

Conversation

@SoonIter
Copy link
Copy Markdown
Owner

@SoonIter SoonIter commented Apr 9, 2026

Summary

  • add a global cache subsystem with fixed directories for git mirrors, npm metadata, tarballs, locks, and temporary staging
  • route git discovery, ref resolution, and materialization through cached mirrors plus ephemeral worktrees
  • persist and revalidate npm metadata, reuse cached tarballs across installs, and update docs to reflect the new behavior
  • add cache-focused tests covering mirror reuse, offline pinned commits, metadata revalidation, tarball reuse, and SPM_CACHE_DIR

Testing

  • pnpm test
  • pnpm build
  • pnpm build:website

Copilot AI review requested due to automatic review settings April 9, 2026 08:24
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
skills-package-manager aee2b24 Commit Preview URL

Branch Preview URL
Apr 09 2026, 08:25 AM

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a persistent, user-level cache subsystem to speed up repeated installs by reusing git mirrors and npm artifacts across commands and projects, with documentation and tests updated accordingly.

Changes:

  • Added a global cache module (configurable via SPM_CACHE_DIR) with temp/lock handling and hashed cache keys.
  • Routed git discovery/materialization through cached mirrors + temporary worktrees, and added npm metadata/tarball caching with revalidation.
  • Added cache-focused tests and updated docs/README to describe the new caching behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
website/docs/architecture/how-it-works.mdx Documents caching behavior during materialization and SPM_CACHE_DIR.
website/docs/architecture/cli-commands.mdx Updates CLI behavior description to reflect cached mirrors/artifacts.
packages/skills-package-manager/test/helpers.ts Extends mock npm registry helper with ETag/Last-Modified support and request counters.
packages/skills-package-manager/test/cache.test.ts Adds tests covering mirror reuse, offline pinned commits, metadata revalidation, tarball reuse, and cache dir override.
packages/skills-package-manager/src/npm/packPackage.ts Implements cached npm metadata and tarball download/reuse with cache locks.
packages/skills-package-manager/src/install/materializePackedSkill.ts Uses cache temp staging for packed skill extraction.
packages/skills-package-manager/src/install/materializeGitSkill.ts Switches git materialization to cached worktrees.
packages/skills-package-manager/src/install/installSkills.ts Stops treating npm tarballs as per-run temp artifacts; relies on persistent cache.
packages/skills-package-manager/src/github/listSkills.ts Switches discovery to cached git worktrees.
packages/skills-package-manager/src/config/syncSkillsLock.ts Resolves git commits via cached mirrors rather than ls-remote/temp clones.
packages/skills-package-manager/src/cache/index.ts Adds cache root/path resolution, temp dirs, and a filesystem-based lock implementation.
packages/skills-package-manager/src/cache/git.ts Adds cached git mirror management and ephemeral worktree creation/cleanup.
packages/skills-package-manager/README.md Updates user-facing docs to describe caching and new cache/ module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +156 to +158
const worktreePath = await createCacheTempDir('skills-pm-git-worktree-')
const resolvedCommit = await resolveGitCommitFromMirror(repoUrl, ref)
const mirrorPath = await getMirrorPath(repoUrl)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createGitWorktree creates worktreePath before resolving the commit; if resolveGitCommitFromMirror() throws (e.g., network/auth/ref error), the newly created temp directory will be leaked under the cache tmp dir. Resolve the commit first (or wrap resolve in a try/catch that removes worktreePath on failure) before proceeding to add the worktree.

Suggested change
const worktreePath = await createCacheTempDir('skills-pm-git-worktree-')
const resolvedCommit = await resolveGitCommitFromMirror(repoUrl, ref)
const mirrorPath = await getMirrorPath(repoUrl)
const resolvedCommit = await resolveGitCommitFromMirror(repoUrl, ref)
const mirrorPath = await getMirrorPath(repoUrl)
const worktreePath = await createCacheTempDir('skills-pm-git-worktree-')

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +9
const CACHE_DIR_NAME = 'skills-pm'
const STALE_LOCK_MS = 10 * 60 * 1000
const STALE_TMP_MS = 24 * 60 * 60 * 1000
const LOCK_POLL_MS = 100
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache lock implementation can delete active locks: STALE_LOCK_MS is 10 minutes and stale detection/cleanup uses directory mtime. If a locked operation takes >10m (e.g., a hung fetch() without a timeout), another process can remove the lock directory and proceed concurrently, risking cache corruption. Consider (a) adding request timeouts/abort signals for network fetches, (b) extending the stale window significantly, and/or (c) recording lock ownership and checking whether the owning PID is still alive before deleting.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +96
await cleanupDirectoryEntries(cache.tmpDir, STALE_TMP_MS, async (entryPath) => {
await rm(entryPath, { recursive: true, force: true })
})
await cleanupDirectoryEntries(cache.locksDir, STALE_LOCK_MS, async (entryPath) => {
await rm(entryPath, { recursive: true, force: true })
})
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanupDirectoryEntries(cache.locksDir, STALE_LOCK_MS, ...) removes lock directories purely based on mtime, which can race with other processes that legitimately hold a lock longer than STALE_LOCK_MS. This can break mutual exclusion by deleting a live lock during cache initialization. Consider removing this cleanup entirely, or making it PID-aware (and only deleting when the owning process is gone) rather than time-based.

Copilot uses AI. Check for mistakes.
Comment on lines +383 to +417
async function readCachedTarball(
tarballPath: string,
expectedIntegrity?: string,
): Promise<string | null> {
try {
if (!expectedIntegrity) {
await readFile(tarballPath)
return tarballPath
}

const tarballBuffer = await readFile(tarballPath)
if (verifyIntegrity(tarballBuffer, expectedIntegrity)) {
return tarballPath
}
} catch {
return null
}

await rm(tarballPath, { force: true }).catch(() => {})
return null
}

export async function downloadNpmPackageTarball(
cwd: string,
tarballUrl: string,
expectedIntegrity?: string,
): Promise<string> {
const downloadRoot = await mkdtemp(path.join(tmpdir(), 'skills-pm-npm-download-'))

try {
const config = await loadNpmConfig(cwd)
const response = await fetch(tarballUrl, {
headers: createRequestHeaders(config, tarballUrl),
})
if (!response.ok) {
throw new Error(`Failed to download npm tarball: ${response.status}`)
}
const config = await loadNpmConfig(cwd)
const cache = await getCachePaths()
const cacheKey = expectedIntegrity ? `integrity:${expectedIntegrity}` : `url:${tarballUrl}`
const tarballPath = getCacheKeyPath(cache.tarballsDir, cacheKey, '.tgz')
const cachedTarball = await readCachedTarball(tarballPath, expectedIntegrity)
if (cachedTarball) {
return cachedTarball
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readCachedTarball() deletes the cached tarball on integrity mismatch, and downloadNpmPackageTarball() calls it once before acquiring the cache lock. That means a process can remove a shared cached tarball without holding the lock, potentially racing with another install that is concurrently reading/extracting it. Consider moving the “delete corrupted tarball” behavior entirely inside the withCacheLock(...) critical section (and making the pre-lock check read-only).

Copilot uses AI. Check for mistakes.
Comment on lines 78 to 95
if (decodedPath === manifest.name) {
metadataRequests += 1
if (options?.etag && req.headers['if-none-match'] === options.etag) {
metadata304Responses += 1
res.statusCode = 304
res.end()
return
}

metadata200Responses += 1
res.setHeader('content-type', 'application/json')
if (options?.etag) {
res.setHeader('etag', options.etag)
}
if (options?.lastModified) {
res.setHeader('last-modified', options.lastModified)
}
res.end(
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startMockNpmRegistry supports returning last-modified headers, and the production code sends if-modified-since, but the mock registry never checks if-modified-since to emit a 304. Either implement an if-modified-since branch (parallel to the ETag handling) or drop the lastModified option to avoid misleading future tests.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants