Use dynamic ports for Steve to avoid conflicts with other software#10018
Use dynamic ports for Steve to avoid conflicts with other software#10018jandubois merged 4 commits intorancher-sandbox:mainfrom
Conversation
976d98f to
1d65fa8
Compare
Add link and verdict to PR rancher-sandbox#10018 header. Regenerate index.html with generate-index tool. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
Add link and verdict to PR rancher-sandbox#10018 header. Regenerate index.html with generate-index tool. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
6d66797 to
b2e4acd
Compare
Steve previously listened on hardcoded port 9443, which could conflict with other software. Allocate both the HTTPS and HTTP ports dynamically at startup and propagate them to Steve, the dashboard proxy, and the certificate-error handler. Key changes: - Add getAvailablePorts(), which binds all requested ports simultaneously before releasing any, guaranteeing distinct values despite the inherent TOCTOU window. - Accept port arguments in Steve.start() instead of using a constant. Let startup errors propagate to the caller instead of swallowing them; the state-changed handler catches them locally so the Kubernetes ready notification still reaches the UI. - Recreate dashboard proxy middleware on each Steve start via a new setStevePort() method. Express routes use wrapper functions that dereference the current proxy instances, so routes survive recreation without re-registration. - Add /api/steve-port endpoint for Rancher Dashboard to discover the dynamic HTTPS port. - Rename the networking module's port setter to setSteveCertPort() to distinguish it from DashboardServer.setStevePort(). - Bump rancherDashboard to 2.11.1.rd4 (consumes the new endpoint). Signed-off-by: Jan Dubois <jan.dubois@suse.com>
isPortReady() used a raw TCP connect to determine when Steve was ready. Steve accepts TCP connections several seconds before it finishes initializing its API controllers, so the dashboard button was enabled too early. Make an actual HTTP request to /v1 instead, using Steve's HTTP port to avoid certificate validation issues with its self-signed TLS cert. Signed-off-by: Jan Dubois <jan.dubois@suse.com>
b2e4acd to
b69b292
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates Rancher Desktop’s “Steve” integration to avoid port conflicts by selecting ports dynamically at runtime and wiring the chosen port through the dashboard proxy and certificate-error handling so the UI can still reach Steve.
Changes:
- Add
getAvailablePorts()utility (with tests) to obtain distinct ephemeral localhost ports. - Update Steve startup to accept ports as arguments and add a more API-specific readiness probe.
- Add dashboard-side support for dynamic Steve ports (proxy recreation +
/api/steve-portendpoint) and update the cert-error allowlist to use the dynamic port.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/rancher-desktop/utils/networks.ts | Adds getAvailablePorts() for dynamic port selection. |
| pkg/rancher-desktop/utils/tests/networks.spec.ts | Adds Jest coverage for getAvailablePorts(). |
| pkg/rancher-desktop/main/networking/index.ts | Introduces setSteveCertPort() to update cert-error allowlist dynamically. |
| pkg/rancher-desktop/main/dashboardServer/index.ts | Reworks dashboard proxy handling to support Steve port changes and adds /api/steve-port. |
| pkg/rancher-desktop/backend/steve.ts | Makes Steve start accept ports and changes readiness probing to an HTTPS API endpoint. |
| background.ts | Allocates a dynamic port at startup and propagates it to Steve, dashboard proxy, and networking. |
| pkg/rancher-desktop/assets/dependencies.yaml | Bumps rancherDashboard to 2.11.1.rd4. |
| .github/actions/spelling/expect.txt | Adds “TOCTOU” to expected spelling list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // ?? next() fires only when the proxy is absent: createProxyMiddleware | ||
| // returns an async function, so the call always yields a Promise (truthy). | ||
| ProxyKeys.forEach((key) => { | ||
| this.dashboardServer.use(key, this.proxies[key]); | ||
| this.dashboardServer.use(key, (req, res, next) => this.proxies[key]?.(req, res, next) ?? next()); | ||
| }); |
There was a problem hiding this comment.
The proxy wrapper always calls next() because Express middleware (including createProxyMiddleware() from http-proxy-middleware) typically returns void/undefined. With the current ?? next() expression, next() will run even when the proxy handled the request, which can lead to requests falling through to the static/index handler (or double-handling after headers are sent). Refactor the wrapper to call next() only when this.proxies[key] is missing, e.g. by branching on the existence of the proxy instead of using ?? on the middleware return value.
There was a problem hiding this comment.
Fixed by making the code look more like the old code.
| : undefined; | ||
|
|
||
| if (key) { | ||
| return this.proxies[key]?.upgrade(req, socket, head); |
There was a problem hiding this comment.
In the websocket upgrade handler, if the URL maps to a known key but this.proxies[key] is currently undefined, the function returns without logging or destroying the socket. That can leave the connection hanging. Handle the missing-proxy case explicitly (log and socket.destroy(), or return a clear 503 response where possible) instead of optional-chaining the upgrade() call.
| return this.proxies[key]?.upgrade(req, socket, head); | |
| const proxy = this.proxies[key]; | |
| if (!proxy) { | |
| console.log(`No proxy configured for WebSocket upgrade on ${ req.url } (key: ${ key })`); | |
| socket.destroy(); | |
| return; | |
| } | |
| return proxy.upgrade(req, socket, head); |
There was a problem hiding this comment.
Fixed to not call .upgrade (and fall through to socket.destroy() if the handler isn't set.
background.ts
Outdated
| try { | ||
| const [steveHttpsPort] = await getAvailablePorts(1); | ||
|
|
||
| console.log(`Steve ports: HTTPS=${ steveHttpsPort }`); | ||
| await Steve.getInstance().start(steveHttpsPort, 0); | ||
| DashboardServer.getInstance().setStevePort(steveHttpsPort); // recreate proxy middleware |
There was a problem hiding this comment.
PR description says both HTTPS and HTTP ports are allocated dynamically (and getAvailablePorts() was added to reserve multiple ports simultaneously), but this code only allocates one port and passes 0 for httpPort. If the intent is to have deterministic distinct ports (and/or to propagate the HTTP port anywhere), this should request two ports via getAvailablePorts(2) and pass both into Steve.start(httpsPort, httpPort) (and log/propagate both as needed).
There was a problem hiding this comment.
The PR has since changed; the description has been updated.
background.ts
Outdated
| DashboardServer.getInstance().setStevePort(steveHttpsPort); // recreate proxy middleware | ||
| setSteveCertPort(steveHttpsPort); // update certificate-error allowed URLs | ||
| } catch (ex) { |
There was a problem hiding this comment.
setSteveCertPort(steveHttpsPort) updates the certificate-error allowlist, but there’s no corresponding reset when Steve stops. If the OS later reuses the same port for a different local service while the dashboard window exists, the certificate-error handler would still auto-accept invalid certs for https://127.0.0.1:<oldPort>. Consider clearing the allowed Steve port (e.g. setSteveCertPort(0)) when Steve is stopped/disabled and when Steve fails to start, so the allowlist tracks the actual running instance.
There was a problem hiding this comment.
I don't think this is a high priority, since we won't attempt to connect to this if Steve isn't running. We can update this in the future to actually fetch the steve cert from Kubernetes and actually check that.
When we request Steve at the HTTP port, if HTTPS is enabled, it just tries to redirect us to the HTTPS port instead. Therefore we should just disable the HTTP port completely (we never use it, and we do not expect external users); instead, we try the HTTPS port for the health check. Signed-off-by: Mark Yen <mark.yen@suse.com>
a92c7d2 to
24e9a1b
Compare
mook-as
left a comment
There was a problem hiding this comment.
Comments addressed, I think; will go for another round of AI things.
background.ts
Outdated
| try { | ||
| const [steveHttpsPort] = await getAvailablePorts(1); | ||
|
|
||
| console.log(`Steve ports: HTTPS=${ steveHttpsPort }`); | ||
| await Steve.getInstance().start(steveHttpsPort, 0); | ||
| DashboardServer.getInstance().setStevePort(steveHttpsPort); // recreate proxy middleware |
There was a problem hiding this comment.
The PR has since changed; the description has been updated.
background.ts
Outdated
| DashboardServer.getInstance().setStevePort(steveHttpsPort); // recreate proxy middleware | ||
| setSteveCertPort(steveHttpsPort); // update certificate-error allowed URLs | ||
| } catch (ex) { |
There was a problem hiding this comment.
I don't think this is a high priority, since we won't attempt to connect to this if Steve isn't running. We can update this in the future to actually fetch the steve cert from Kubernetes and actually check that.
| : undefined; | ||
|
|
||
| if (key) { | ||
| return this.proxies[key]?.upgrade(req, socket, head); |
There was a problem hiding this comment.
Fixed to not call .upgrade (and fall through to socket.destroy() if the handler isn't set.
| // ?? next() fires only when the proxy is absent: createProxyMiddleware | ||
| // returns an async function, so the call always yields a Promise (truthy). | ||
| ProxyKeys.forEach((key) => { | ||
| this.dashboardServer.use(key, this.proxies[key]); | ||
| this.dashboardServer.use(key, (req, res, next) => this.proxies[key]?.(req, res, next) ?? next()); | ||
| }); |
There was a problem hiding this comment.
Fixed by making the code look more like the old code.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // and the dashboard button is disabled until then. | ||
| ProxyKeys.forEach((key) => { | ||
| this.dashboardServer.use(key, this.proxies[key]); | ||
| this.dashboardServer.use(key, (req, res, next) => this.proxies[key](req, res, next)); |
There was a problem hiding this comment.
init() registers proxy route wrappers that unconditionally call this.proxies[key](...), but this.proxies is empty until setStevePort() runs. Any request to one of these paths before the first successful Steve start (or when Kubernetes is disabled) will throw a TypeError and can crash the dashboard server. Make this resilient by either creating initial proxy middleware instances in init() (even with a placeholder target) or guarding the wrapper to return a clear error (e.g. 503) / next() when this.proxies[key] is not set.
| this.dashboardServer.use(key, (req, res, next) => this.proxies[key](req, res, next)); | |
| this.dashboardServer.use(key, (req, res, next) => { | |
| const proxy = this.proxies[key]; | |
| if (proxy) { | |
| return proxy(req, res, next); | |
| } | |
| console.log(`Proxy for path ${ key } is not available; returning 503 for ${ req.url }`); | |
| res.status(503).send('Kubernetes backend is not available'); | |
| }); |
| return this.proxies[key].upgrade(req, socket, head); | ||
| } | ||
|
|
||
| console.log(`Unknown Web socket upgrade request for ${ req.url }`); |
There was a problem hiding this comment.
Log message uses "Web socket"; standard spelling is "WebSocket" (consistent with the API term) and avoids the appearance of a typo in logs.
| console.log(`Unknown Web socket upgrade request for ${ req.url }`); | |
| console.log(`Unknown WebSocket upgrade request for ${ req.url }`); |
pkg/rancher-desktop/backend/steve.ts
Outdated
| req.on('response', (res) => resolve(res.statusCode === 200)); | ||
| req.on('error', () => resolve(false)); | ||
| // Timeout if we don't get a response in a reasonable time. | ||
| setTimeout(1_000).then(() => resolve(false)); |
There was a problem hiding this comment.
isPortReady() resolves false on a 1s timer but does not abort the underlying Electron.net.request() or consume the response body. This can leave in-flight requests (and sockets) lingering across attempts, leading to unnecessary resource usage and potentially many concurrent requests during waitForReady(). Consider using a single timeout handle that aborts/destroys the request, and ensure the response stream is drained (res.resume() / read data) before resolving.
| req.on('response', (res) => resolve(res.statusCode === 200)); | |
| req.on('error', () => resolve(false)); | |
| // Timeout if we don't get a response in a reasonable time. | |
| setTimeout(1_000).then(() => resolve(false)); | |
| let settled = false; | |
| const timeoutController = new AbortController(); | |
| const onResolve = (result: boolean) => { | |
| if (settled) { | |
| return; | |
| } | |
| settled = true; | |
| // Cancel the timeout if it's still pending. | |
| timeoutController.abort(); | |
| resolve(result); | |
| }; | |
| req.on('response', (res) => { | |
| // Drain the response body to ensure the underlying socket can be reused. | |
| res.resume(); | |
| onResolve(res.statusCode === 200); | |
| }); | |
| req.on('error', () => { | |
| onResolve(false); | |
| }); | |
| // Timeout if we don't get a response in a reasonable time. | |
| setTimeout(1_000, undefined, { signal: timeoutController.signal }).then( | |
| () => { | |
| // Abort the request on timeout to avoid lingering in-flight requests. | |
| try { | |
| req.abort(); | |
| } catch { | |
| // Ignore errors from aborting an already-finished request. | |
| } | |
| onResolve(false); | |
| }, | |
| () => { | |
| // Ignore rejection when the timeout is aborted via timeoutController.abort(). | |
| }, | |
| ); |
860098f to
daab93e
Compare
Signed-off-by: Mark Yen <mark.yen@suse.com>
daab93e to
fa2cbca
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| '--https-listen-port', | ||
| String(httpsPort), | ||
| '--http-listen-port', | ||
| '0', // Disable HTTP support; it does not work correctly anyway. | ||
| ], | ||
| { env }, | ||
| ); |
There was a problem hiding this comment.
Steve.start() is now intended to let startup errors propagate, but spawn failures can result in child.stdout/child.stderr being null (depending on how the process was spawned / failed). The current code path just logs and returns in that case, which can swallow the failure and potentially leave the spawn error unhandled. Consider explicitly setting stdio: 'pipe' in the spawn options and/or throwing when the streams are unavailable so the caller can handle a failed Steve startup reliably.
There was a problem hiding this comment.
Not really relevant; we can fix it later if that turns out to be an issue.
mook-as
left a comment
There was a problem hiding this comment.
I don't think there's anything left I want to fix.
Approving (because Jan started the PR); @jandubois please approve too if that looks fine for you?
| '--https-listen-port', | ||
| String(httpsPort), | ||
| '--http-listen-port', | ||
| '0', // Disable HTTP support; it does not work correctly anyway. | ||
| ], | ||
| { env }, | ||
| ); |
There was a problem hiding this comment.
Not really relevant; we can fix it later if that turns out to be an issue.
jandubois
left a comment
There was a problem hiding this comment.
Thanks, LGTM. Can't actually approve as I'm the PR author.
Latest AI Deep Review: https://jandubois.github.io/rancher-desktop/20260326-130316-pr-10018.html
|
@jandubois Thanks! An item from the FAQ can be removed or updated once this fix is released: |
Steve previously listened on hardcoded port
9443, which could conflict with other software. Allocate the HTTPS port dynamically at startup and propagate them to Steve, the dashboard proxy, and the certificate-error handler.The HTTP port doesn't actually work (Steve tries to direct to the HTTPS port, but it uses the wrong port number), so just tell Steve to never actually listen on HTTP instead.
Key changes:
Add
getAvailablePorts(), which binds all requested ports simultaneously before releasing any, guaranteeing distinct values despite the inherent TOCTOU window.Accept port arguments in
Steve.start()instead of using a constant. Let startup errors propagate to the caller instead of swallowing them; the state-changed handler catches them locally so the Kubernetes ready notification still reaches the UI.Force Steve to disable the use of HTTP.
Recreate dashboard proxy middleware on each Steve start via a new
setStevePort()method. Express routes use wrapper functions that dereference the current proxy instances, so routes survive recreation without re-registration.Add
/api/steve-portendpoint for Rancher Dashboard to discover the dynamic HTTPS port.Rename the networking module's port setter to
setSteveCertPort()to distinguish it fromDashboardServer.setStevePort().Bump rancherDashboard to 2.11.1.rd4 (consumes the new endpoint). See Fetch Steve port dynamically instead of hardcoding 9443 rancher-desktop-dashboard#4.
Fixes #1890