Skip to content

feat: impl load balancing and failover#109

Merged
zitsen merged 107 commits intomainfrom
feat/main/failover4.0
Mar 28, 2026
Merged

feat: impl load balancing and failover#109
zitsen merged 107 commits intomainfrom
feat/main/failover4.0

Conversation

@qevolg
Copy link
Copy Markdown
Contributor

@qevolg qevolg commented Mar 14, 2026

Description

feat: impl load balancing and failover

Issue(s)

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and availability of the TDengine Node.js WebSocket Connector. It introduces comprehensive multi-address support, enabling the connector to distribute load across multiple server instances and automatically failover to healthy alternatives when connections are lost. This includes sophisticated retry mechanisms for in-flight requests, improved connection pool management based on DSNs, and enhanced concurrency control, ensuring a more resilient and transparent experience for applications interacting with TDengine.

Highlights

  • Multi-Address Support: The TDengine Node.js WebSocket Connector now supports configuring multiple TDengine server addresses via a Data Source Name (DSN) string, enhancing system high availability.
  • Load Balancing and Automatic Failover: Implemented random initial address selection for load distribution and automatic failover to alternative addresses upon connection failure, incorporating configurable retry logic with exponential backoff.
  • Inflight Request Retry Mechanism: Introduced a robust mechanism to track and replay 'retriable' in-flight requests (e.g., 'insert', 'options_connection') after a successful reconnection, covering both string and binary messages.
  • DSN Parsing and Connection Pool Enhancements: A new DSN parsing utility was added, and the connection pool logic was updated to generate connection keys based on the DSN, ensuring proper isolation across different address lists, databases, and authentication credentials.
  • Concurrency Control for Reconnection: Enhanced the reconnection logic with a _reconnectLock Promise to prevent race conditions that could arise from concurrent error or close events, ensuring only one reconnection process runs at a time.
  • Improved Timeout Management: Modified the callback lifecycle management to immediately clear timeout timers when a callback is hit, effectively preventing 'success then timeout' race conditions.
  • Increased Default Connection Timeout: The default WebSocket connection timeout in wsConnector.ts was increased from 5 seconds to 60 seconds to accommodate potentially longer connection establishment times.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/superpowers/specs/2026-03-13-multi-address-load-balancing-failover-design.md
    • Added a comprehensive design document detailing the multi-address load balancing and failover features for the TDengine Node.js WebSocket Connector.
  • nodejs/src/client/wsConnector.ts
    • Increased the default connection timeout from 5000ms to 60000ms.
  • nodejs/src/common/dsn.ts
    • Introduced a new utility for parsing multi-host Data Source Name (DSN) strings, including support for user information, multiple addresses, IPv6, database specification, and query parameters.
  • nodejs/test/bulkPulling/dsn.test.ts
    • Added extensive unit tests for the DSN parsing utility, covering various valid and invalid URL formats, host list configurations, and port specifications.
Activity
  • No specific activity (comments, reviews, progress) was found in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot]

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.10%. Comparing base (86b9aaf) to head (7ad34cc).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   80.21%   81.10%   +0.88%     
==========================================
  Files          30       33       +3     
  Lines        2487     3271     +784     
  Branches      437      589     +152     
==========================================
+ Hits         1995     2653     +658     
- Misses        378      473      +95     
- Partials      114      145      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

qevolg added 18 commits March 14, 2026 21:34
…subscription, failover, and token handling

- Implement tests for normal connection and error scenarios in `tmq.test.ts`
- Add tests for configuration handling in `tmq.config.test.ts`
- Introduce failover tests to ensure resilience in `tmq.failover.test.ts`
- Create a dedicated test suite for cloud interactions in `tmq.cloud.test.ts`
- Validate token-based authentication and URL token handling in `tmq.test.ts`
- Ensure proper cleanup of test databases and topics after tests
Copilot AI review requested due to automatic review settings March 26, 2026 03:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 53 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

nodejs/src/client/wsConnector.ts:267

  • WsEventCallback.handleEventCallback() is async and can throw (e.g. when no callback is registered). In _onmessage, it is invoked without await/.catch, which can lead to unhandled promise rejections at runtime. Consider prefixing with void ...handleEventCallback(...).catch(err => logger.error(...)) (or otherwise handling the promise) to avoid process-level unhandled rejection behavior.
    private _onmessage(event: any) {
        let data = event.data;
        logger.debug("wsClient._onMessage()====" + Object.prototype.toString.call(data));
        if (Object.prototype.toString.call(data) === "[object ArrayBuffer]") {
            let id = new DataView(data, 26, 8).getBigUint64(0, true);
            WsEventCallback.instance().handleEventCallback(
                { id: id, action: "", req_id: BigInt(0) },
                OnMessageType.MESSAGE_TYPE_ARRAYBUFFER,
                data
            );
        } else if (Object.prototype.toString.call(data) === "[object String]") {
            let msg = JSON.parse(data);
            logger.debug("[_onmessage.stringType]==>:" + data);
            WsEventCallback.instance().handleEventCallback(
                { id: BigInt(0), action: msg.action, req_id: msg.req_id },
                OnMessageType.MESSAGE_TYPE_STRING,
                msg
            );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 26, 2026 03:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 53 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

nodejs/src/client/wsConnector.ts:289

  • WsEventCallback.handleEventCallback() is async, but _onmessage calls it without awaiting or attaching a .catch(). If handleEventCallback throws/rejects (e.g., callback timed out/was unregistered), this becomes an unhandled promise rejection. Wrap these calls with void ... .catch(err => logger.error(...)) (or similar) to avoid process-level unhandled rejection behavior.
    private _onmessage(event: any) {
        let data = event.data;
        logger.debug("wsClient._onMessage()====" + Object.prototype.toString.call(data));
        if (Object.prototype.toString.call(data) === "[object ArrayBuffer]") {
            let id = new DataView(data, 26, 8).getBigUint64(0, true);
            WsEventCallback.instance().handleEventCallback(
                { id: id, action: "", req_id: BigInt(0) },
                OnMessageType.MESSAGE_TYPE_ARRAYBUFFER,
                data
            );
        } else if (Object.prototype.toString.call(data) === "[object String]") {
            let msg = JSON.parse(data);
            logger.debug("[_onmessage.stringType]==>:" + data);
            WsEventCallback.instance().handleEventCallback(
                { id: BigInt(0), action: msg.action, req_id: msg.req_id },
                OnMessageType.MESSAGE_TYPE_STRING,
                msg
            );
        } else {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 26, 2026 06:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 54 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • nodejs/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 26, 2026 07:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 55 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • nodejs/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)

nodejs/test/tmq/tmq.test.ts:300

  • expect([104]).toContain(e.code) is equivalent to a direct equality check and is harder to read. Since only one code is expected now, prefer expect(e.code).toBe(104) (or keep multiple codes if the error can legitimately vary).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sheyanjie-qq
Copy link
Copy Markdown
Contributor

• ## Review Findings

  • [P1] Encode auth fields unambiguously in the pool key (nodejs/src/client/wsConnectorPool.ts:36)
    The pool key currently concatenates auth fields into a raw string, so credentials containing : can collide. For example, user='a:b', pwd='c' and user='a', pwd='b:c' produce the same key. That
    allows the pool to reuse a socket authenticated for the wrong account, which reintroduces the cross-credential sharing this refactor is trying to prevent.
  • [P2] Strip query/path before splitting DSN userinfo (nodejs/src/common/dsn.ts:105)
    The parser looks for @ before removing the query string or path. As a result, a valid DSN such as ws://localhost:6041?bearer_token=a@b is misparsed as if it contained userinfo and host b, ins
    tead of host localhost. This is a regression from the previous new URL(...) behavior and can break any DSN where a token or later URL component contains @.
  • [P2] Scope pooled connectors by reconnect tuning parameters (nodejs/src/client/wsConnectorPool.ts:43)
    The pool key ignores retries, retry_backoff_ms, and retry_backoff_max_ms, but WebSocketConnector captures those values at construction time. If one caller creates a connector with ?retries=1
    and another later opens the same DSN with ?retries=60, the second caller can silently reuse the first connector and get the wrong failover policy.
  • [P2] Restore the default SQL database during session recovery (nodejs/src/client/wsClient.ts:94)
    WsSql.open() still runs use information_schema when no database is explicitly provided. On reconnect, this recovery path only replays conn with connectedDatabase, which is null in that case,
    so the recovered session comes back without a current database. After failover, commands that relied on the post-open default context, such as show tables or unqualified ins
    * queries, can
    start failing even though they worked before the disconnect.

Copilot AI review requested due to automatic review settings March 27, 2026 06:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 55 changed files in this pull request and generated 2 comments.

Files not reviewed (1)
  • nodejs/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@zitsen zitsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

All 8 CI checks pass. Coverage increased from 80.21% → 81.10%.

Previously Reported Issues — All Resolved

  1. [P1] Pool key auth collision — Fixed. Auth fields are now JSON-serialized + SHA-256 hashed in buildAuthScope(), eliminating collisions like user='a:b',pwd='c' vs user='a',pwd='b:c'.

  2. [P2] DSN parsing @ in query params — Fixed. The parser now isolates the authority section before / and ? prior to looking for @, so bearer_token=a@b is handled correctly. Test coverage confirms.

  3. [P2] Pool key missing retry params — Intentional design. Retry parameters are excluded from the pool key; instead, refreshRetryConfig() updates retry settings when reusing pooled connections. This avoids unnecessary pool fragmentation.

  4. [P2] Session recovery default DB — Fixed. normalizeConnectedDatabase() returns information_schema when no explicit DB is provided for SQL paths. recoverSqlSessionContext() uses this during reconnect.

Additional Verification

  • Reconnect lock: _reconnectLock correctly deduplicates concurrent reconnect triggers.
  • Inflight tracking: Requests are properly removed on resolve/reject; all inflight requests are failed and cleared when reconnect fails.
  • Address selection: Least-connected strategy with proper increment/decrement lifecycle.
  • No memory leaks: Callbacks auto-unregister, inflight store clears on close/failure.

LGTM ✅

@zitsen zitsen merged commit 40cf375 into main Mar 28, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants