fix: use bulk query for getAllNodes to prevent PG pool exhaustion by Yeraze · Pull Request #2404 · Yeraze/meshmonitor

Yeraze · 2026-03-23T17:38:56Z

Summary

Fixes PostgreSQL connection pool timeout errors on meshes with many nodes (216+).

Root cause: meshtasticManager.getAllNodes() fires one getLatestTelemetryForType query per node via Promise.all. With 216 nodes, this creates 216 concurrent DB queries that exhaust the pg-pool (default 10 connections), causing timeout exceeded when trying to connect errors.
Fix: Switch two callers in server.ts to use getAllNodesAsync() which does a single bulk query for all node uptimes.
Affected endpoints: /api/nodes (every page load) and unread counts (frequent polling)

Changes

File	Change
`src/server/server.ts`	2 calls changed from `getAllNodes()` → `getAllNodesAsync()`

Test plan

npx vitest run — 3070 tests pass
npm run build — no TypeScript errors
Deploy on PostgreSQL backend with 200+ nodes, verify no pool timeout errors

🤖 Generated with Claude Code

Two callers in server.ts used meshtasticManager.getAllNodes() which fires one telemetry query per node via Promise.all. With 216+ nodes this saturates the PostgreSQL connection pool (default 10), causing timeouts. Switch to getAllNodesAsync() which uses a single bulk query for all node uptimes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-23T17:39:11Z

Claude finished @Yeraze's task —— View job

PR Review

I've analyzed this PR that fixes PostgreSQL pool exhaustion issues with large meshes (216+ nodes). Here's my comprehensive review:

Todo List:

Read and analyze the changed file (src/server/server.ts)
Review the database service methods being used
Assess the fix approach and implementation
Check for potential bugs or edge cases
Evaluate performance impact
Review security implications
Assess test coverage needs
Provide comprehensive feedback

✅ Code Quality & Implementation

Excellent fix approach! The changes are minimal, targeted, and address the root cause effectively:

Lines changed:

src/server/server.ts:839 - /api/nodes endpoint
src/server/server.ts:2153 - /messages/unread-counts endpoint

Root cause analysis is spot-on:

getAllNodes() at line 10874-10880: Uses Promise.all() with individual getLatestTelemetryForType() calls per node
getAllNodesAsync() at line 10867-10871: Uses single bulk query getLatestTelemetryValueForAllNodes()
With 216+ nodes: 216 concurrent DB queries vs 1 bulk query

✅ Performance Impact

Significant improvement:

Before: O(N) database connections (216 concurrent queries)
After: O(1) database connection (1 bulk query)
PostgreSQL specific: Default pool size is 10 connections, so 216 concurrent queries would definitely exhaust the pool
Database agnostic: The bulk query method supports SQLite, PostgreSQL, and MySQL through optimized raw SQL

✅ Security Considerations

No security concerns:

Changes are purely optimization, no new attack vectors
Same data access patterns and permissions
No exposure of additional data

✅ Correctness & Edge Cases

Implementation looks solid:

Both methods return identical DeviceInfo[] structures
Same filtering and permission logic applied afterward
mapDbNodeToDeviceInfo() shared between both methods ensures consistent output
All existing callers remain compatible

⚠️ Potential Considerations

Incomplete migration: There are still 4 other locations in the codebase using getAllNodes():
- src/server/server.ts:1879 (duplicate key detection)
- src/server/server.ts:3673 (telemetry available nodes)
- src/server/server.ts:3801 (poll endpoint)
- src/server/server.ts:3946 (poll dashboard)
Testing on other DB backends: While the bulk query supports all backends, only PostgreSQL pool exhaustion was mentioned in testing.

📋 Test Coverage

Current testing is good:

✅ 3070 tests pass
✅ TypeScript build succeeds
✅ Focused on the most critical endpoints (page load + frequent polling)

Suggestions:

Consider load testing the remaining getAllNodes() calls with 200+ nodes
Verify bulk query performance on MySQL/SQLite with large datasets

🎯 Final Assessment

LGTM! This is a well-researched, targeted performance fix that addresses a real production issue. The implementation is clean, preserves functionality, and significantly improves scalability.

Recommendation: ✅ Approve and merge

The fix addresses the immediate pain point (pool exhaustion) in the most critical endpoints. The remaining getAllNodes() calls can be migrated in future PRs if they also prove problematic under load.

Remove the old getAllNodes() that fired one telemetry query per node. Update test mocks to use getAllNodesAsync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-23T17:49:47Z

Claude finished @Yeraze's task —— View job

PR Review

I've analyzed this PR that fixes PostgreSQL pool exhaustion issues with large meshes (216+ nodes). Here's my comprehensive review:

Todo List:

Read and analyze the changed files (src/server/server.ts)
Review the database service methods being used
Assess the fix approach and implementation
Check for potential bugs or edge cases
Evaluate performance impact
Review security implications
Assess test coverage needs
Provide comprehensive feedback

✅ Code Quality & Implementation

Excellent fix approach! The changes are minimal, targeted, and address the root cause effectively:

Lines changed:

src/server/server.ts:839 - /api/nodes endpoint
src/server/server.ts:2153 - /messages/unread-counts endpoint

Root cause analysis is spot-on:

getAllNodes() at line 10874-10880: Uses Promise.all() with individual getLatestTelemetryForType() calls per node
getAllNodesAsync() at line 10867-10871: Uses single bulk query getLatestTelemetryValueForAllNodes()
With 216+ nodes: 216 concurrent DB queries vs 1 bulk query

✅ Performance Impact

Significant improvement:

Before: O(N) database connections (216 concurrent queries)
After: O(1) database connection (1 bulk query)
PostgreSQL specific: Default pool size is 10 connections, so 216 concurrent queries would definitely exhaust the pool
Database agnostic: The bulk query method supports SQLite, PostgreSQL, and MySQL through optimized raw SQL

✅ Security Considerations

No security concerns:

Changes are purely optimization, no new attack vectors
Same data access patterns and permissions
No exposure of additional data

✅ Correctness & Edge Cases

Implementation looks solid:

Both methods return identical DeviceInfo[] structures
Same filtering and permission logic applied afterward
mapDbNodeToDeviceInfo() shared between both methods ensures consistent output
All existing callers remain compatible

⚠️ Potential Considerations

Incomplete migration: There are still 4 other locations in the codebase using getAllNodes():
- src/server/server.ts:1879 (duplicate key detection)
- src/server/server.ts:3673 (telemetry available nodes)
- src/server/server.ts:3801 (poll endpoint)
- src/server/server.ts:3946 (poll dashboard)
Testing on other DB backends: While the bulk query supports all backends, only PostgreSQL pool exhaustion was mentioned in testing.

📋 Test Coverage

Current testing is good:

✅ 3070 tests pass
✅ TypeScript build succeeds
✅ Focused on the most critical endpoints (page load + frequent polling)

Suggestions:

Consider load testing the remaining getAllNodes() calls with 200+ nodes
Verify bulk query performance on MySQL/SQLite with large datasets

🎯 Final Assessment

LGTM! This is a well-researched, targeted performance fix that addresses a real production issue. The implementation is clean, preserves functionality, and significantly improves scalability.

Recommendation: ✅ Approve and merge

The fix addresses the immediate pain point (pool exhaustion) in the most critical endpoints. The remaining getAllNodes() calls can be migrated in future PRs if they also prove problematic under load.

refactor: remove N+1 getAllNodes method, keep only bulk getAllNodesAsync

5220e01

Remove the old getAllNodes() that fired one telemetry query per node. Update test mocks to use getAllNodesAsync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Yeraze merged commit ce85557 into main Mar 23, 2026
17 checks passed

Yeraze deleted the fix/getAllNodes-n-plus-1 branch March 23, 2026 18:42

This was referenced Mar 23, 2026

chore: bump version to 3.10.0-RC7 #2405

Merged

chore: release v3.10.0 #2417

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: use bulk query for getAllNodes to prevent PG pool exhaustion#2404

fix: use bulk query for getAllNodes to prevent PG pool exhaustion#2404
Yeraze merged 2 commits intomainfrom
fix/getAllNodes-n-plus-1

Yeraze commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Yeraze commented Mar 23, 2026

Summary

Changes

Test plan

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

✅ Code Quality & Implementation

✅ Performance Impact

✅ Security Considerations

✅ Correctness & Edge Cases

⚠️ Potential Considerations

📋 Test Coverage

🎯 Final Assessment

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

✅ Code Quality & Implementation

✅ Performance Impact

✅ Security Considerations

✅ Correctness & Edge Cases

⚠️ Potential Considerations

📋 Test Coverage

🎯 Final Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 23, 2026 •

edited

Loading

github-actions bot commented Mar 23, 2026 •

edited

Loading