Conversation
…king, 0 bytes freed), contributes to liveness probe failures under load
📋 PR Review Helper📱 Mobile App Build⏳ Waiting for build... 🕶️ ASG Client Build⏳ Waiting for build... 🔀 Test Locallygh pr checkout 2353 |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRemoved the post-disconnect forced GC from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…obes in porter.yaml — liveness is zero-computation, 3s timeout
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cloud/porter.yaml (1)
45-49: Optional: deduplicate repeated ingress timeout rationale comments.Line 45–49 repeats the same rationale already documented at Line 21–25. Keeping one canonical block will reduce drift risk.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/porter.yaml` around lines 45 - 49, The YAML contains a duplicated comment block explaining "Extended timeouts for WebSocket connections (/glasses-ws, /app-ws)" (same rationale appears twice); remove the repeated comment so only the canonical rationale block remains (keep the first occurrence that documents the 3600s proxy timeout rationale and delete the later duplicate) to avoid drift while preserving the explanation for the nginx/Porter timeout tweak.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cloud/porter.yaml`:
- Around line 45-49: The YAML contains a duplicated comment block explaining
"Extended timeouts for WebSocket connections (/glasses-ws, /app-ws)" (same
rationale appears twice); remove the repeated comment so only the canonical
rationale block remains (keep the first occurrence that documents the 3600s
proxy timeout rationale and delete the later duplicate) to avoid drift while
preserving the explanation for the nginx/Porter timeout tweak.
…nt, event loop lag, UDP stats visible in Porter dashboard
What
Three changes to improve pod stability and observability:
gc-after-disconnect— eliminates forced GC blocking on every session disconnect/livez(zero computation, 3s timeout), readiness →/health(5s timeout)Why
gc-after-disconnect
Confirmed wasteful: 31 calls/hour on US Central, 2,242ms total event loop blocking, freed 0 bytes every single time. All objects are live session data — nothing to collect after a single disconnect. Adds unnecessary event loop blocking during disconnect storms, contributing to liveness probe failures.
Liveness probe on /livez
Previously, liveness and readiness both hit
/healthwhich iterates all sessions, counts WebSockets, updates metric gauges, and serializes JSON. With 60 sessions under load, this could take >1 second, causing liveness failures → SIGKILL./livezjust returns"ok"— if the event loop can return 2 bytes, the process is alive./healthstays as the readiness probe (if slow, pod is removed from LB gracefully instead of killed).Metrics scraping
The
/metricsendpoint already exposes Prometheus gauges (mentra_user_sessions,mentra_event_loop_lag_ms,mentra_miniapp_sessions, UDP stats, WS message counts). EnablingmetricsScrapingin porter.yaml tells Porter's Prometheus to scrape it, making these visible in Porter's built-in dashboard. The whole team can see session count and event loop health without needing BetterStack access.Tested on
Deployed to
cloud-debugon US Central. Verified:gc-after-disconnectevents (removal confirmed working)Changes
UserSession.tsgc-after-disconnectblock,canRunPostDisconnectGc(), static fieldsporter.yamllivenessCheck→/livez,readinessCheck→/health,metricsScraping,terminationGracePeriodSecondsporter-debug.ymlSummary by CodeRabbit