Skip to content

[worker] Resolve runtime cache settings from job env and fallback on npm cache failures#3855

Open
AbbanMustafa wants to merge 2 commits into
mainfrom
mus/npm-cache-settings
Open

[worker] Resolve runtime cache settings from job env and fallback on npm cache failures#3855
AbbanMustafa wants to merge 2 commits into
mainfrom
mus/npm-cache-settings

Conversation

@AbbanMustafa

@AbbanMustafa AbbanMustafa commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Why

We want to enable npm cache proxy usage per build job, but the worker was loading runtime settings before the job environment was available. That meant job-scoped environment variables like EAS_USE_NPM_CACHE=1 https://github.com/expo/universe/pull/27727 could not control cache URL resolution.

We also need cache proxy rollout to be safe: if the proxy is unavailable or misconfigured, builds should fall back to the normal npm registry behavior instead of failing during the post-prebuild install.

How

  • Move worker runtime settings loading into createBuildContext, where the job env is available.
  • Resolve runtime cache URLs from job env plus runner cache URL fallbacks.
  • Set NPM_CONFIG_REGISTRY only when the job has enabled npm cache usage.
  • Add post-prebuild npm install fallback:
    • enabled only when EAS_USE_NPM_CACHE=1
    • detects install failures caused by the configured npm cache registry
    • reports warning-level Sentry events
    • retries the same npm install after removing only NPM_CONFIG_REGISTRY, so npm falls back to project/user/default registry settings
  • Add Sentry reporting for non-fatal npm cache registry errors that npm logs while still exiting 0.

Test Plan

  • Added unit coverage for job-env runtime cache settings resolution.
  • Added unit coverage for post-prebuild npm cache fallback behavior:
    • retries without the cache registry when the cache install fails
    • reports non-fatal cache registry errors to Sentry when npm logs an error but exits 0
    • does not retry when EAS_USE_NPM_CACHE is not enabled
  • Validated the failure path with a local build by setting EAS_USE_NPM_CACHE=1 from www. Cache private network could not resolve/connect to the internal cache host, which reproduced the proxy failure path and confirmed the build retried through the normal npm registry behavior, and logged to sentry

https://expoio.sentry.io/issues/7511993527/?environment=development&project=1837720&query=is%3Aunresolved&referrer=issue-stream&statsPeriod=14d

Screenshot 2026-06-11 at 9 34 16 PM

NOOP when env var is not set

Screenshot 2026-06-11 at 11 24 49 PM

@github-actions

Copy link
Copy Markdown

Subscribed to pull request

File Patterns Mentions
**/* @douglowder

Generated by CodeMention

@AbbanMustafa AbbanMustafa added the no changelog PR that doesn't require a changelog entry label Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.53086% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.49%. Comparing base (b30dd90) to head (1e7c270).

Files with missing lines Patch % Lines
...ages/build-tools/src/common/installDependencies.ts 96.56% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3855      +/-   ##
==========================================
+ Coverage   58.43%   58.49%   +0.07%     
==========================================
  Files         917      919       +2     
  Lines       39996    40125     +129     
  Branches     8419     8438      +19     
==========================================
+ Hits        23368    23468     +100     
- Misses      16533    16561      +28     
- Partials       95       96       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown

⏩ The changelog entry check has been skipped since the "no changelog" label is present.

useFrozenLockfile: boolean;
}): Promise<void> {
const npmCacheUrl = env.NPM_CONFIG_REGISTRY;
const npmCacheErrorTracker = createNpmCacheRegistryErrorTracker({ env, npmCacheUrl });

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this captures cache failures that npm only prints to stdout/stderr, like audit endpoint ENOTFOUND, so we still log them to Sentry when npm exits 0 and skips the catch/fallback path

@AbbanMustafa AbbanMustafa requested a review from sjchmiela June 12, 2026 04:23

@sjchmiela sjchmiela left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs some polish

const cocoapodsCacheUrl = RuntimeSettings.getCocoapodsCacheUrl();

setEnv(env, 'NPM_CACHE_URL', npmCacheUrl);
setEnv(env, 'NPM_CONFIG_REGISTRY', npmCacheUrl);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gosh it took me so long to find docs proving this should work lol

does this replace npm config set registry?

const childLogger = buildLogger.child({ buildId });
await RuntimeSettings.loadAsync({
environment: config.env,
logger: childLogger,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so actually about this logger -- i don't think we should use the build-facing logger. i don't think we need users to see "Failed to fetch worker runtime settings, using defaults", especially if it's in an "(unnamed build phase)"

i kept this logger from the start so "if we ever were to inspect logs of the service" we would have a record of this failure. maybe it was a bad idea and we should just rely only on sentry and env vars printing as distinction?

Comment on lines +42 to +47
cacheUrlFallbacks: {
npm: config.npmCacheUrl,
nodejs: config.nodeJsCacheUrl,
maven: config.mavenCacheUrl,
cocoapods: config.cocoapodsCacheUrl,
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we land #3856 first?

}
}

function getCacheUrl(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not do this, i know codex wanted to do it ssoooooo bad. we can inline variables and checks in these four places in one file

cocoapods?: string | null;
};

type RuntimeEnvironment = Record<string, string | undefined>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have Env and BuildEnv types that should be able to denote this

Comment on lines +216 to +217
const sentryError = new Error('Non-fatal npm cache registry error during dependency install');
sentryError.name = 'NpmCacheRegistryNonFatalError';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be better if we made it a subclass of error?

Comment on lines +135 to +136
const fallbackEnv = { ...env };
delete fallbackEnv.NPM_CONFIG_REGISTRY;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const fallbackEnv = { ...env };
delete fallbackEnv.NPM_CONFIG_REGISTRY;
const { NPM_CONFIG_REGISTRY: _NPM_CONFIG_REGISTRY, ...fallbackEnv } = env;

? or maybe not. i think there were days when delete had bad rep

Comment on lines +268 to +269
const error = new Error('Failed to install dependencies using npm cache registry');
error.name = 'NpmCacheRegistryInstallError';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subclass

also use SystemError / UserError when possible

and i think you should be able to pass err as cause? i wish we would remove getErrorField and getErrorMessage

Comment on lines +253 to +261
function getErrorOutput(err: unknown): string {
if (!err || typeof err !== 'object') {
return '';
}
const { stdout, stderr } = err as { stdout?: unknown; stderr?: unknown };
return [stdout, stderr]
.filter((output): output is string => typeof output === 'string')
.join('\n');
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can check if instanceof spawnpromise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog PR that doesn't require a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants