Skip to content

Use dynamic ports for Steve to avoid conflicts with other software#10018

Merged
jandubois merged 4 commits intorancher-sandbox:mainfrom
jandubois:steve-dynamic-ports
Mar 26, 2026
Merged

Use dynamic ports for Steve to avoid conflicts with other software#10018
jandubois merged 4 commits intorancher-sandbox:mainfrom
jandubois:steve-dynamic-ports

Conversation

@jandubois
Copy link
Member

@jandubois jandubois commented Mar 14, 2026

Steve previously listened on hardcoded port 9443, which could conflict with other software. Allocate the HTTPS port dynamically at startup and propagate them to Steve, the dashboard proxy, and the certificate-error handler.

The HTTP port doesn't actually work (Steve tries to direct to the HTTPS port, but it uses the wrong port number), so just tell Steve to never actually listen on HTTP instead.

Key changes:

  • Add getAvailablePorts(), which binds all requested ports simultaneously before releasing any, guaranteeing distinct values despite the inherent TOCTOU window.

  • Accept port arguments in Steve.start() instead of using a constant. Let startup errors propagate to the caller instead of swallowing them; the state-changed handler catches them locally so the Kubernetes ready notification still reaches the UI.

  • Force Steve to disable the use of HTTP.

  • Recreate dashboard proxy middleware on each Steve start via a new setStevePort() method. Express routes use wrapper functions that dereference the current proxy instances, so routes survive recreation without re-registration.

  • Add /api/steve-port endpoint for Rancher Dashboard to discover the dynamic HTTPS port.

  • Rename the networking module's port setter to setSteveCertPort() to distinguish it from DashboardServer.setStevePort().

  • Bump rancherDashboard to 2.11.1.rd4 (consumes the new endpoint). See Fetch Steve port dynamically instead of hardcoding 9443 rancher-desktop-dashboard#4.

Fixes #1890

@jandubois jandubois force-pushed the steve-dynamic-ports branch 15 times, most recently from 976d98f to 1d65fa8 Compare March 19, 2026 21:13
jandubois added a commit to jandubois/rancher-desktop that referenced this pull request Mar 19, 2026
Add link and verdict to PR rancher-sandbox#10018 header. Regenerate index.html with
generate-index tool.

Signed-off-by: Jan Dubois <jan.dubois@suse.com>
jandubois added a commit to jandubois/rancher-desktop that referenced this pull request Mar 19, 2026
Add link and verdict to PR rancher-sandbox#10018 header. Regenerate index.html with
generate-index tool.

Signed-off-by: Jan Dubois <jan.dubois@suse.com>
@jandubois jandubois force-pushed the steve-dynamic-ports branch 3 times, most recently from 6d66797 to b2e4acd Compare March 20, 2026 20:34
Steve previously listened on hardcoded port 9443, which could conflict
with other software. Allocate both the HTTPS and HTTP ports dynamically
at startup and propagate them to Steve, the dashboard proxy, and the
certificate-error handler.

Key changes:

- Add getAvailablePorts(), which binds all requested ports simultaneously
  before releasing any, guaranteeing distinct values despite the inherent
  TOCTOU window.

- Accept port arguments in Steve.start() instead of using a constant.
  Let startup errors propagate to the caller instead of swallowing them;
  the state-changed handler catches them locally so the Kubernetes ready
  notification still reaches the UI.

- Recreate dashboard proxy middleware on each Steve start via a new
  setStevePort() method.  Express routes use wrapper functions that
  dereference the current proxy instances, so routes survive recreation
  without re-registration.

- Add /api/steve-port endpoint for Rancher Dashboard to discover the
  dynamic HTTPS port.

- Rename the networking module's port setter to setSteveCertPort() to
  distinguish it from DashboardServer.setStevePort().

- Bump rancherDashboard to 2.11.1.rd4 (consumes the new endpoint).

Signed-off-by: Jan Dubois <jan.dubois@suse.com>
isPortReady() used a raw TCP connect to determine when Steve was
ready. Steve accepts TCP connections several seconds before it
finishes initializing its API controllers, so the dashboard button
was enabled too early. Make an actual HTTP request to /v1 instead,
using Steve's HTTP port to avoid certificate validation issues with
its self-signed TLS cert.

Signed-off-by: Jan Dubois <jan.dubois@suse.com>
@jandubois jandubois force-pushed the steve-dynamic-ports branch from b2e4acd to b69b292 Compare March 20, 2026 22:51
@mook-as mook-as requested a review from Copilot March 25, 2026 23:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Rancher Desktop’s “Steve” integration to avoid port conflicts by selecting ports dynamically at runtime and wiring the chosen port through the dashboard proxy and certificate-error handling so the UI can still reach Steve.

Changes:

  • Add getAvailablePorts() utility (with tests) to obtain distinct ephemeral localhost ports.
  • Update Steve startup to accept ports as arguments and add a more API-specific readiness probe.
  • Add dashboard-side support for dynamic Steve ports (proxy recreation + /api/steve-port endpoint) and update the cert-error allowlist to use the dynamic port.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pkg/rancher-desktop/utils/networks.ts Adds getAvailablePorts() for dynamic port selection.
pkg/rancher-desktop/utils/tests/networks.spec.ts Adds Jest coverage for getAvailablePorts().
pkg/rancher-desktop/main/networking/index.ts Introduces setSteveCertPort() to update cert-error allowlist dynamically.
pkg/rancher-desktop/main/dashboardServer/index.ts Reworks dashboard proxy handling to support Steve port changes and adds /api/steve-port.
pkg/rancher-desktop/backend/steve.ts Makes Steve start accept ports and changes readiness probing to an HTTPS API endpoint.
background.ts Allocates a dynamic port at startup and propagates it to Steve, dashboard proxy, and networking.
pkg/rancher-desktop/assets/dependencies.yaml Bumps rancherDashboard to 2.11.1.rd4.
.github/actions/spelling/expect.txt Adds “TOCTOU” to expected spelling list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 103 to 107
// ?? next() fires only when the proxy is absent: createProxyMiddleware
// returns an async function, so the call always yields a Promise (truthy).
ProxyKeys.forEach((key) => {
this.dashboardServer.use(key, this.proxies[key]);
this.dashboardServer.use(key, (req, res, next) => this.proxies[key]?.(req, res, next) ?? next());
});
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proxy wrapper always calls next() because Express middleware (including createProxyMiddleware() from http-proxy-middleware) typically returns void/undefined. With the current ?? next() expression, next() will run even when the proxy handled the request, which can lead to requests falling through to the static/index handler (or double-handling after headers are sent). Refactor the wrapper to call next() only when this.proxies[key] is missing, e.g. by branching on the existence of the proxy instead of using ?? on the middleware return value.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by making the code look more like the old code.

: undefined;

if (key) {
return this.proxies[key]?.upgrade(req, socket, head);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the websocket upgrade handler, if the URL maps to a known key but this.proxies[key] is currently undefined, the function returns without logging or destroying the socket. That can leave the connection hanging. Handle the missing-proxy case explicitly (log and socket.destroy(), or return a clear 503 response where possible) instead of optional-chaining the upgrade() call.

Suggested change
return this.proxies[key]?.upgrade(req, socket, head);
const proxy = this.proxies[key];
if (!proxy) {
console.log(`No proxy configured for WebSocket upgrade on ${ req.url } (key: ${ key })`);
socket.destroy();
return;
}
return proxy.upgrade(req, socket, head);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to not call .upgrade (and fall through to socket.destroy() if the handler isn't set.

background.ts Outdated
Comment on lines +1279 to +1284
try {
const [steveHttpsPort] = await getAvailablePorts(1);

console.log(`Steve ports: HTTPS=${ steveHttpsPort }`);
await Steve.getInstance().start(steveHttpsPort, 0);
DashboardServer.getInstance().setStevePort(steveHttpsPort); // recreate proxy middleware
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says both HTTPS and HTTP ports are allocated dynamically (and getAvailablePorts() was added to reserve multiple ports simultaneously), but this code only allocates one port and passes 0 for httpPort. If the intent is to have deterministic distinct ports (and/or to propagate the HTTP port anywhere), this should request two ports via getAvailablePorts(2) and pass both into Steve.start(httpsPort, httpPort) (and log/propagate both as needed).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has since changed; the description has been updated.

background.ts Outdated
Comment on lines +1284 to +1286
DashboardServer.getInstance().setStevePort(steveHttpsPort); // recreate proxy middleware
setSteveCertPort(steveHttpsPort); // update certificate-error allowed URLs
} catch (ex) {
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setSteveCertPort(steveHttpsPort) updates the certificate-error allowlist, but there’s no corresponding reset when Steve stops. If the OS later reuses the same port for a different local service while the dashboard window exists, the certificate-error handler would still auto-accept invalid certs for https://127.0.0.1:<oldPort>. Consider clearing the allowed Steve port (e.g. setSteveCertPort(0)) when Steve is stopped/disabled and when Steve fails to start, so the allowlist tracks the actual running instance.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a high priority, since we won't attempt to connect to this if Steve isn't running. We can update this in the future to actually fetch the steve cert from Kubernetes and actually check that.

When we request Steve at the HTTP port, if HTTPS is enabled, it just tries
to redirect us to the HTTPS port instead.  Therefore we should just disable
the HTTP port completely (we never use it, and we do not expect external
users); instead, we try the HTTPS port for the health check.

Signed-off-by: Mark Yen <mark.yen@suse.com>
@mook-as mook-as force-pushed the steve-dynamic-ports branch from a92c7d2 to 24e9a1b Compare March 26, 2026 00:52
Copy link
Contributor

@mook-as mook-as left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed, I think; will go for another round of AI things.

background.ts Outdated
Comment on lines +1279 to +1284
try {
const [steveHttpsPort] = await getAvailablePorts(1);

console.log(`Steve ports: HTTPS=${ steveHttpsPort }`);
await Steve.getInstance().start(steveHttpsPort, 0);
DashboardServer.getInstance().setStevePort(steveHttpsPort); // recreate proxy middleware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has since changed; the description has been updated.

background.ts Outdated
Comment on lines +1284 to +1286
DashboardServer.getInstance().setStevePort(steveHttpsPort); // recreate proxy middleware
setSteveCertPort(steveHttpsPort); // update certificate-error allowed URLs
} catch (ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a high priority, since we won't attempt to connect to this if Steve isn't running. We can update this in the future to actually fetch the steve cert from Kubernetes and actually check that.

: undefined;

if (key) {
return this.proxies[key]?.upgrade(req, socket, head);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to not call .upgrade (and fall through to socket.destroy() if the handler isn't set.

Comment on lines 103 to 107
// ?? next() fires only when the proxy is absent: createProxyMiddleware
// returns an async function, so the call always yields a Promise (truthy).
ProxyKeys.forEach((key) => {
this.dashboardServer.use(key, this.proxies[key]);
this.dashboardServer.use(key, (req, res, next) => this.proxies[key]?.(req, res, next) ?? next());
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by making the code look more like the old code.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// and the dashboard button is disabled until then.
ProxyKeys.forEach((key) => {
this.dashboardServer.use(key, this.proxies[key]);
this.dashboardServer.use(key, (req, res, next) => this.proxies[key](req, res, next));
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init() registers proxy route wrappers that unconditionally call this.proxies[key](...), but this.proxies is empty until setStevePort() runs. Any request to one of these paths before the first successful Steve start (or when Kubernetes is disabled) will throw a TypeError and can crash the dashboard server. Make this resilient by either creating initial proxy middleware instances in init() (even with a placeholder target) or guarding the wrapper to return a clear error (e.g. 503) / next() when this.proxies[key] is not set.

Suggested change
this.dashboardServer.use(key, (req, res, next) => this.proxies[key](req, res, next));
this.dashboardServer.use(key, (req, res, next) => {
const proxy = this.proxies[key];
if (proxy) {
return proxy(req, res, next);
}
console.log(`Proxy for path ${ key } is not available; returning 503 for ${ req.url }`);
res.status(503).send('Kubernetes backend is not available');
});

Copilot uses AI. Check for mistakes.
return this.proxies[key].upgrade(req, socket, head);
}

console.log(`Unknown Web socket upgrade request for ${ req.url }`);
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log message uses "Web socket"; standard spelling is "WebSocket" (consistent with the API term) and avoids the appearance of a typo in logs.

Suggested change
console.log(`Unknown Web socket upgrade request for ${ req.url }`);
console.log(`Unknown WebSocket upgrade request for ${ req.url }`);

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +188
req.on('response', (res) => resolve(res.statusCode === 200));
req.on('error', () => resolve(false));
// Timeout if we don't get a response in a reasonable time.
setTimeout(1_000).then(() => resolve(false));
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isPortReady() resolves false on a 1s timer but does not abort the underlying Electron.net.request() or consume the response body. This can leave in-flight requests (and sockets) lingering across attempts, leading to unnecessary resource usage and potentially many concurrent requests during waitForReady(). Consider using a single timeout handle that aborts/destroys the request, and ensure the response stream is drained (res.resume() / read data) before resolving.

Suggested change
req.on('response', (res) => resolve(res.statusCode === 200));
req.on('error', () => resolve(false));
// Timeout if we don't get a response in a reasonable time.
setTimeout(1_000).then(() => resolve(false));
let settled = false;
const timeoutController = new AbortController();
const onResolve = (result: boolean) => {
if (settled) {
return;
}
settled = true;
// Cancel the timeout if it's still pending.
timeoutController.abort();
resolve(result);
};
req.on('response', (res) => {
// Drain the response body to ensure the underlying socket can be reused.
res.resume();
onResolve(res.statusCode === 200);
});
req.on('error', () => {
onResolve(false);
});
// Timeout if we don't get a response in a reasonable time.
setTimeout(1_000, undefined, { signal: timeoutController.signal }).then(
() => {
// Abort the request on timeout to avoid lingering in-flight requests.
try {
req.abort();
} catch {
// Ignore errors from aborting an already-finished request.
}
onResolve(false);
},
() => {
// Ignore rejection when the timeout is aborted via timeoutController.abort().
},
);

Copilot uses AI. Check for mistakes.
@mook-as mook-as force-pushed the steve-dynamic-ports branch from 860098f to daab93e Compare March 26, 2026 19:44
Signed-off-by: Mark Yen <mark.yen@suse.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +74 to 80
'--https-listen-port',
String(httpsPort),
'--http-listen-port',
'0', // Disable HTTP support; it does not work correctly anyway.
],
{ env },
);
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steve.start() is now intended to let startup errors propagate, but spawn failures can result in child.stdout/child.stderr being null (depending on how the process was spawned / failed). The current code path just logs and returns in that case, which can swallow the failure and potentially leave the spawn error unhandled. Consider explicitly setting stdio: 'pipe' in the spawn options and/or throwing when the streams are unavailable so the caller can handle a failed Steve startup reliably.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really relevant; we can fix it later if that turns out to be an issue.

Copy link
Contributor

@mook-as mook-as left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything left I want to fix.

Approving (because Jan started the PR); @jandubois please approve too if that looks fine for you?

Comment on lines +74 to 80
'--https-listen-port',
String(httpsPort),
'--http-listen-port',
'0', // Disable HTTP support; it does not work correctly anyway.
],
{ env },
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really relevant; we can fix it later if that turns out to be an issue.

@mook-as mook-as marked this pull request as ready for review March 26, 2026 20:00
Copy link
Member Author

@jandubois jandubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM. Can't actually approve as I'm the PR author.

Latest AI Deep Review: https://jandubois.github.io/rancher-desktop/20260326-130316-pr-10018.html

@jandubois jandubois merged commit fdb7029 into rancher-sandbox:main Mar 26, 2026
23 checks passed
@jandubois jandubois deleted the steve-dynamic-ports branch March 26, 2026 22:22
@joebowbeer
Copy link

joebowbeer commented Mar 26, 2026

@jandubois Thanks!

An item from the FAQ can be removed or updated once this fix is released:

https://docs.rancherdesktop.io/troubleshooting-tips/#q-why-do-i-see-a-blank-screen-when-i-launch-the-cluster-dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make ports for Steve configurable (9080 and 9443)

4 participants