[worker] Resolve runtime cache settings from job env and fallback on npm cache failures#3855
[worker] Resolve runtime cache settings from job env and fallback on npm cache failures#3855AbbanMustafa wants to merge 2 commits into
Conversation
…npm cache failures
|
Subscribed to pull request
Generated by CodeMention |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3855 +/- ##
==========================================
+ Coverage 58.43% 58.49% +0.07%
==========================================
Files 917 919 +2
Lines 39996 40125 +129
Branches 8419 8438 +19
==========================================
+ Hits 23368 23468 +100
- Misses 16533 16561 +28
- Partials 95 96 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…npm cache failures
|
⏩ The changelog entry check has been skipped since the "no changelog" label is present. |
| useFrozenLockfile: boolean; | ||
| }): Promise<void> { | ||
| const npmCacheUrl = env.NPM_CONFIG_REGISTRY; | ||
| const npmCacheErrorTracker = createNpmCacheRegistryErrorTracker({ env, npmCacheUrl }); |
There was a problem hiding this comment.
this captures cache failures that npm only prints to stdout/stderr, like audit endpoint ENOTFOUND, so we still log them to Sentry when npm exits 0 and skips the catch/fallback path
| const cocoapodsCacheUrl = RuntimeSettings.getCocoapodsCacheUrl(); | ||
|
|
||
| setEnv(env, 'NPM_CACHE_URL', npmCacheUrl); | ||
| setEnv(env, 'NPM_CONFIG_REGISTRY', npmCacheUrl); |
There was a problem hiding this comment.
gosh it took me so long to find docs proving this should work lol
does this replace npm config set registry?
| const childLogger = buildLogger.child({ buildId }); | ||
| await RuntimeSettings.loadAsync({ | ||
| environment: config.env, | ||
| logger: childLogger, |
There was a problem hiding this comment.
so actually about this logger -- i don't think we should use the build-facing logger. i don't think we need users to see "Failed to fetch worker runtime settings, using defaults", especially if it's in an "(unnamed build phase)"
i kept this logger from the start so "if we ever were to inspect logs of the service" we would have a record of this failure. maybe it was a bad idea and we should just rely only on sentry and env vars printing as distinction?
| cacheUrlFallbacks: { | ||
| npm: config.npmCacheUrl, | ||
| nodejs: config.nodeJsCacheUrl, | ||
| maven: config.mavenCacheUrl, | ||
| cocoapods: config.cocoapodsCacheUrl, | ||
| }, |
| } | ||
| } | ||
|
|
||
| function getCacheUrl( |
There was a problem hiding this comment.
let's not do this, i know codex wanted to do it ssoooooo bad. we can inline variables and checks in these four places in one file
| cocoapods?: string | null; | ||
| }; | ||
|
|
||
| type RuntimeEnvironment = Record<string, string | undefined>; |
There was a problem hiding this comment.
we already have Env and BuildEnv types that should be able to denote this
| const sentryError = new Error('Non-fatal npm cache registry error during dependency install'); | ||
| sentryError.name = 'NpmCacheRegistryNonFatalError'; |
There was a problem hiding this comment.
wouldn't it be better if we made it a subclass of error?
| const fallbackEnv = { ...env }; | ||
| delete fallbackEnv.NPM_CONFIG_REGISTRY; |
There was a problem hiding this comment.
| const fallbackEnv = { ...env }; | |
| delete fallbackEnv.NPM_CONFIG_REGISTRY; | |
| const { NPM_CONFIG_REGISTRY: _NPM_CONFIG_REGISTRY, ...fallbackEnv } = env; |
? or maybe not. i think there were days when delete had bad rep
| const error = new Error('Failed to install dependencies using npm cache registry'); | ||
| error.name = 'NpmCacheRegistryInstallError'; |
There was a problem hiding this comment.
subclass
also use SystemError / UserError when possible
and i think you should be able to pass err as cause? i wish we would remove getErrorField and getErrorMessage
| function getErrorOutput(err: unknown): string { | ||
| if (!err || typeof err !== 'object') { | ||
| return ''; | ||
| } | ||
| const { stdout, stderr } = err as { stdout?: unknown; stderr?: unknown }; | ||
| return [stdout, stderr] | ||
| .filter((output): output is string => typeof output === 'string') | ||
| .join('\n'); | ||
| } |
There was a problem hiding this comment.
i think we can check if instanceof spawnpromise?
Why
We want to enable npm cache proxy usage per build job, but the worker was loading runtime settings before the job environment was available. That meant job-scoped environment variables like
EAS_USE_NPM_CACHE=1https://github.com/expo/universe/pull/27727 could not control cache URL resolution.We also need cache proxy rollout to be safe: if the proxy is unavailable or misconfigured, builds should fall back to the normal npm registry behavior instead of failing during the post-prebuild install.
How
createBuildContext, where the job env is available.NPM_CONFIG_REGISTRYonly when the job has enabled npm cache usage.EAS_USE_NPM_CACHE=1NPM_CONFIG_REGISTRY, so npm falls back to project/user/default registry settings0.Test Plan
0EAS_USE_NPM_CACHEis not enabledEAS_USE_NPM_CACHE=1from www. Cache private network could not resolve/connect to the internal cache host, which reproduced the proxy failure path and confirmed the build retried through the normal npm registry behavior, and logged to sentryhttps://expoio.sentry.io/issues/7511993527/?environment=development&project=1837720&query=is%3Aunresolved&referrer=issue-stream&statsPeriod=14d
NOOP when env var is not set