Skip to content

fix: use bulk query for getAllNodes to prevent PG pool exhaustion#2404

Merged
Yeraze merged 2 commits intomainfrom
fix/getAllNodes-n-plus-1
Mar 23, 2026
Merged

fix: use bulk query for getAllNodes to prevent PG pool exhaustion#2404
Yeraze merged 2 commits intomainfrom
fix/getAllNodes-n-plus-1

Conversation

@Yeraze
Copy link
Owner

@Yeraze Yeraze commented Mar 23, 2026

Summary

Fixes PostgreSQL connection pool timeout errors on meshes with many nodes (216+).

  • Root cause: meshtasticManager.getAllNodes() fires one getLatestTelemetryForType query per node via Promise.all. With 216 nodes, this creates 216 concurrent DB queries that exhaust the pg-pool (default 10 connections), causing timeout exceeded when trying to connect errors.
  • Fix: Switch two callers in server.ts to use getAllNodesAsync() which does a single bulk query for all node uptimes.
  • Affected endpoints: /api/nodes (every page load) and unread counts (frequent polling)

Changes

File Change
src/server/server.ts 2 calls changed from getAllNodes()getAllNodesAsync()

Test plan

  • npx vitest run — 3070 tests pass
  • npm run build — no TypeScript errors
  • Deploy on PostgreSQL backend with 200+ nodes, verify no pool timeout errors

🤖 Generated with Claude Code

Two callers in server.ts used meshtasticManager.getAllNodes() which fires
one telemetry query per node via Promise.all. With 216+ nodes this
saturates the PostgreSQL connection pool (default 10), causing timeouts.

Switch to getAllNodesAsync() which uses a single bulk query for all
node uptimes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 23, 2026

Claude finished @Yeraze's task —— View job


PR Review

I've analyzed this PR that fixes PostgreSQL pool exhaustion issues with large meshes (216+ nodes). Here's my comprehensive review:

Todo List:

  • Read and analyze the changed file (src/server/server.ts)
  • Review the database service methods being used
  • Assess the fix approach and implementation
  • Check for potential bugs or edge cases
  • Evaluate performance impact
  • Review security implications
  • Assess test coverage needs
  • Provide comprehensive feedback

✅ Code Quality & Implementation

Excellent fix approach! The changes are minimal, targeted, and address the root cause effectively:

Lines changed:

  • src/server/server.ts:839 - /api/nodes endpoint
  • src/server/server.ts:2153 - /messages/unread-counts endpoint

Root cause analysis is spot-on:

  • getAllNodes() at line 10874-10880: Uses Promise.all() with individual getLatestTelemetryForType() calls per node
  • getAllNodesAsync() at line 10867-10871: Uses single bulk query getLatestTelemetryValueForAllNodes()
  • With 216+ nodes: 216 concurrent DB queries vs 1 bulk query

✅ Performance Impact

Significant improvement:

  • Before: O(N) database connections (216 concurrent queries)
  • After: O(1) database connection (1 bulk query)
  • PostgreSQL specific: Default pool size is 10 connections, so 216 concurrent queries would definitely exhaust the pool
  • Database agnostic: The bulk query method supports SQLite, PostgreSQL, and MySQL through optimized raw SQL

✅ Security Considerations

No security concerns:

  • Changes are purely optimization, no new attack vectors
  • Same data access patterns and permissions
  • No exposure of additional data

✅ Correctness & Edge Cases

Implementation looks solid:

  • Both methods return identical DeviceInfo[] structures
  • Same filtering and permission logic applied afterward
  • mapDbNodeToDeviceInfo() shared between both methods ensures consistent output
  • All existing callers remain compatible

⚠️ Potential Considerations

  1. Incomplete migration: There are still 4 other locations in the codebase using getAllNodes():

    • src/server/server.ts:1879 (duplicate key detection)
    • src/server/server.ts:3673 (telemetry available nodes)
    • src/server/server.ts:3801 (poll endpoint)
    • src/server/server.ts:3946 (poll dashboard)
  2. Testing on other DB backends: While the bulk query supports all backends, only PostgreSQL pool exhaustion was mentioned in testing.

📋 Test Coverage

Current testing is good:

  • ✅ 3070 tests pass
  • ✅ TypeScript build succeeds
  • ✅ Focused on the most critical endpoints (page load + frequent polling)

Suggestions:

  • Consider load testing the remaining getAllNodes() calls with 200+ nodes
  • Verify bulk query performance on MySQL/SQLite with large datasets

🎯 Final Assessment

LGTM! This is a well-researched, targeted performance fix that addresses a real production issue. The implementation is clean, preserves functionality, and significantly improves scalability.

Recommendation: ✅ Approve and merge

The fix addresses the immediate pain point (pool exhaustion) in the most critical endpoints. The remaining getAllNodes() calls can be migrated in future PRs if they also prove problematic under load.


Remove the old getAllNodes() that fired one telemetry query per node.
Update test mocks to use getAllNodesAsync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 23, 2026

Claude finished @Yeraze's task —— View job


PR Review

I've analyzed this PR that fixes PostgreSQL pool exhaustion issues with large meshes (216+ nodes). Here's my comprehensive review:

Todo List:

  • Read and analyze the changed files (src/server/server.ts)
  • Review the database service methods being used
  • Assess the fix approach and implementation
  • Check for potential bugs or edge cases
  • Evaluate performance impact
  • Review security implications
  • Assess test coverage needs
  • Provide comprehensive feedback

✅ Code Quality & Implementation

Excellent fix approach! The changes are minimal, targeted, and address the root cause effectively:

Lines changed:

  • src/server/server.ts:839 - /api/nodes endpoint
  • src/server/server.ts:2153 - /messages/unread-counts endpoint

Root cause analysis is spot-on:

  • getAllNodes() at line 10874-10880: Uses Promise.all() with individual getLatestTelemetryForType() calls per node
  • getAllNodesAsync() at line 10867-10871: Uses single bulk query getLatestTelemetryValueForAllNodes()
  • With 216+ nodes: 216 concurrent DB queries vs 1 bulk query

✅ Performance Impact

Significant improvement:

  • Before: O(N) database connections (216 concurrent queries)
  • After: O(1) database connection (1 bulk query)
  • PostgreSQL specific: Default pool size is 10 connections, so 216 concurrent queries would definitely exhaust the pool
  • Database agnostic: The bulk query method supports SQLite, PostgreSQL, and MySQL through optimized raw SQL

✅ Security Considerations

No security concerns:

  • Changes are purely optimization, no new attack vectors
  • Same data access patterns and permissions
  • No exposure of additional data

✅ Correctness & Edge Cases

Implementation looks solid:

  • Both methods return identical DeviceInfo[] structures
  • Same filtering and permission logic applied afterward
  • mapDbNodeToDeviceInfo() shared between both methods ensures consistent output
  • All existing callers remain compatible

⚠️ Potential Considerations

  1. Incomplete migration: There are still 4 other locations in the codebase using getAllNodes():

    • src/server/server.ts:1879 (duplicate key detection)
    • src/server/server.ts:3673 (telemetry available nodes)
    • src/server/server.ts:3801 (poll endpoint)
    • src/server/server.ts:3946 (poll dashboard)
  2. Testing on other DB backends: While the bulk query supports all backends, only PostgreSQL pool exhaustion was mentioned in testing.

📋 Test Coverage

Current testing is good:

  • ✅ 3070 tests pass
  • ✅ TypeScript build succeeds
  • ✅ Focused on the most critical endpoints (page load + frequent polling)

Suggestions:

  • Consider load testing the remaining getAllNodes() calls with 200+ nodes
  • Verify bulk query performance on MySQL/SQLite with large datasets

🎯 Final Assessment

LGTM! This is a well-researched, targeted performance fix that addresses a real production issue. The implementation is clean, preserves functionality, and significantly improves scalability.

Recommendation: ✅ Approve and merge

The fix addresses the immediate pain point (pool exhaustion) in the most critical endpoints. The remaining getAllNodes() calls can be migrated in future PRs if they also prove problematic under load.

@Yeraze Yeraze merged commit ce85557 into main Mar 23, 2026
17 checks passed
@Yeraze Yeraze deleted the fix/getAllNodes-n-plus-1 branch March 23, 2026 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant