feat: impl load balancing and failover by qevolg · Pull Request #109 · taosdata/taos-connector-node

qevolg · 2026-03-14T10:57:29Z

Description

feat: impl load balancing and failover

Issue(s)

close: https://project.feishu.cn/taosdata_td/feature/detail/6856918838

Checklist

Please check the items in the checklist if applicable.

Is the user manual updated?
Are the test cases passed and automated?
Is there no significant decrease in test coverage?

…anagement

… reconnection

…nc-mutex

…ability

… request handling

gemini-code-assist · 2026-03-14T10:57:44Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and availability of the TDengine Node.js WebSocket Connector. It introduces comprehensive multi-address support, enabling the connector to distribute load across multiple server instances and automatically failover to healthy alternatives when connections are lost. This includes sophisticated retry mechanisms for in-flight requests, improved connection pool management based on DSNs, and enhanced concurrency control, ensuring a more resilient and transparent experience for applications interacting with TDengine.

Highlights

Multi-Address Support: The TDengine Node.js WebSocket Connector now supports configuring multiple TDengine server addresses via a Data Source Name (DSN) string, enhancing system high availability.
Load Balancing and Automatic Failover: Implemented random initial address selection for load distribution and automatic failover to alternative addresses upon connection failure, incorporating configurable retry logic with exponential backoff.
Inflight Request Retry Mechanism: Introduced a robust mechanism to track and replay 'retriable' in-flight requests (e.g., 'insert', 'options_connection') after a successful reconnection, covering both string and binary messages.
DSN Parsing and Connection Pool Enhancements: A new DSN parsing utility was added, and the connection pool logic was updated to generate connection keys based on the DSN, ensuring proper isolation across different address lists, databases, and authentication credentials.
Concurrency Control for Reconnection: Enhanced the reconnection logic with a _reconnectLock Promise to prevent race conditions that could arise from concurrent error or close events, ensuring only one reconnection process runs at a time.
Improved Timeout Management: Modified the callback lifecycle management to immediately clear timeout timers when a callback is hit, effectively preventing 'success then timeout' race conditions.
Increased Default Connection Timeout: The default WebSocket connection timeout in wsConnector.ts was increased from 5 seconds to 60 seconds to accommodate potentially longer connection establishment times.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/superpowers/specs/2026-03-13-multi-address-load-balancing-failover-design.md
- Added a comprehensive design document detailing the multi-address load balancing and failover features for the TDengine Node.js WebSocket Connector.
nodejs/src/client/wsConnector.ts
- Increased the default connection timeout from 5000ms to 60000ms.
nodejs/src/common/dsn.ts
- Introduced a new utility for parsing multi-host Data Source Name (DSN) strings, including support for user information, multiple addresses, IPv6, database specification, and query parameters.
nodejs/test/bulkPulling/dsn.test.ts
- Added extensive unit tests for the DSN parsing utility, covering various valid and invalid URL formats, host list configurations, and port specifications.

Activity

No specific activity (comments, reviews, progress) was found in the provided context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codecov-commenter · 2026-03-14T11:17:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.10%. Comparing base (86b9aaf) to head (7ad34cc).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   80.21%   81.10%   +0.88%     
==========================================
  Files          30       33       +3     
  Lines        2487     3271     +784     
  Branches      437      589     +152     
==========================================
+ Hits         1995     2653     +658     
- Misses        378      473      +95     
- Partials      114      145      +31

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…etry logic

…nd WebSocketConnector

…ame in schemaless test

…SN parsing

…tion handling

…d tests

…cross the codebase

…prove logging

…ethod in WebSocketConnector

…ect method in WebSocketConnector

…on check and renaming variables

…ordered tracking and replay functionality

…subscription, failover, and token handling - Implement tests for normal connection and error scenarios in `tmq.test.ts` - Add tests for configuration handling in `tmq.config.test.ts` - Introduce failover tests to ensure resilience in `tmq.failover.test.ts` - Create a dedicated test suite for cloud interactions in `tmq.cloud.test.ts` - Validate token-based authentication and URL token handling in `tmq.test.ts` - Ensure proper cleanup of test databases and topics after tests

Copilot

Pull request overview

Copilot reviewed 53 out of 53 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

nodejs/src/client/wsConnector.ts:267

WsEventCallback.handleEventCallback() is async and can throw (e.g. when no callback is registered). In _onmessage, it is invoked without await/.catch, which can lead to unhandled promise rejections at runtime. Consider prefixing with void ...handleEventCallback(...).catch(err => logger.error(...)) (or otherwise handling the promise) to avoid process-level unhandled rejection behavior.

    private _onmessage(event: any) {
        let data = event.data;
        logger.debug("wsClient._onMessage()====" + Object.prototype.toString.call(data));
        if (Object.prototype.toString.call(data) === "[object ArrayBuffer]") {
            let id = new DataView(data, 26, 8).getBigUint64(0, true);
            WsEventCallback.instance().handleEventCallback(
                { id: id, action: "", req_id: BigInt(0) },
                OnMessageType.MESSAGE_TYPE_ARRAYBUFFER,
                data
            );
        } else if (Object.prototype.toString.call(data) === "[object String]") {
            let msg = JSON.parse(data);
            logger.debug("[_onmessage.stringType]==>:" + data);
            WsEventCallback.instance().handleEventCallback(
                { id: BigInt(0), action: msg.action, req_id: msg.req_id },
                OnMessageType.MESSAGE_TYPE_STRING,
                msg
            );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… corresponding test

…nnector for better organization

Copilot

Pull request overview

Copilot reviewed 53 out of 53 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

nodejs/src/client/wsConnector.ts:289

WsEventCallback.handleEventCallback() is async, but _onmessage calls it without awaiting or attaching a .catch(). If handleEventCallback throws/rejects (e.g., callback timed out/was unregistered), this becomes an unhandled promise rejection. Wrap these calls with void ... .catch(err => logger.error(...)) (or similar) to avoid process-level unhandled rejection behavior.

    private _onmessage(event: any) {
        let data = event.data;
        logger.debug("wsClient._onMessage()====" + Object.prototype.toString.call(data));
        if (Object.prototype.toString.call(data) === "[object ArrayBuffer]") {
            let id = new DataView(data, 26, 8).getBigUint64(0, true);
            WsEventCallback.instance().handleEventCallback(
                { id: id, action: "", req_id: BigInt(0) },
                OnMessageType.MESSAGE_TYPE_ARRAYBUFFER,
                data
            );
        } else if (Object.prototype.toString.call(data) === "[object String]") {
            let msg = JSON.parse(data);
            logger.debug("[_onmessage.stringType]==>:" + data);
            WsEventCallback.instance().handleEventCallback(
                { id: BigInt(0), action: msg.action, req_id: msg.req_id },
                OnMessageType.MESSAGE_TYPE_STRING,
                msg
            );
        } else {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nodejs/src/client/wsConnector.ts

…bility

…lose

Copilot

Pull request overview

Copilot reviewed 53 out of 54 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

nodejs/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nodejs/src/client/wsClient.ts

…e in WebSocketConnector

…k.json

Copilot

Pull request overview

Copilot reviewed 54 out of 55 changed files in this pull request and generated 3 comments.

Files not reviewed (1)

nodejs/package-lock.json: Language not supported

Comments suppressed due to low confidence (1)

nodejs/test/tmq/tmq.test.ts:300

expect([104]).toContain(e.code) is equivalent to a direct equality check and is harder to read. Since only one code is expected now, prefer expect(e.code).toBe(104) (or keep multiple codes if the error can legitimately vary).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nodejs/src/client/wsConnectorPool.ts

nodejs/src/tmq/wsTmq.ts

nodejs/src/tmq/config.ts

…Consumer

sheyanjie-qq · 2026-03-27T02:57:54Z

• ## Review Findings

[P1] Encode auth fields unambiguously in the pool key (nodejs/src/client/wsConnectorPool.ts:36)
The pool key currently concatenates auth fields into a raw string, so credentials containing : can collide. For example, user='a:b', pwd='c' and user='a', pwd='b:c' produce the same key. That
allows the pool to reuse a socket authenticated for the wrong account, which reintroduces the cross-credential sharing this refactor is trying to prevent.
[P2] Strip query/path before splitting DSN userinfo (nodejs/src/common/dsn.ts:105)
The parser looks for @ before removing the query string or path. As a result, a valid DSN such as ws://localhost:6041?bearer_token=a@b is misparsed as if it contained userinfo and host b, ins
tead of host localhost. This is a regression from the previous new URL(...) behavior and can break any DSN where a token or later URL component contains @.
[P2] Scope pooled connectors by reconnect tuning parameters (nodejs/src/client/wsConnectorPool.ts:43)
The pool key ignores retries, retry_backoff_ms, and retry_backoff_max_ms, but WebSocketConnector captures those values at construction time. If one caller creates a connector with ?retries=1
and another later opens the same DSN with ?retries=60, the second caller can silently reuse the first connector and get the wrong failover policy.
[P2] Restore the default SQL database during session recovery (nodejs/src/client/wsClient.ts:94)
WsSql.open() still runs use information_schema when no database is explicitly provided. On reconnect, this recovery path only replays conn with connectedDatabase, which is null in that case,
so the recovered session comes back without a current database. After failover, commands that relied on the post-open default context, such as show tables or unqualified ins* queries, can
start failing even though they worked before the disconnect.

…bSocket client

Copilot

Pull request overview

Copilot reviewed 54 out of 55 changed files in this pull request and generated 2 comments.

Files not reviewed (1)

nodejs/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nodejs/src/client/wsConnector.ts

zitsen

Code Review Summary

All 8 CI checks pass. Coverage increased from 80.21% → 81.10%.

Previously Reported Issues — All Resolved

[P1] Pool key auth collision — Fixed. Auth fields are now JSON-serialized + SHA-256 hashed in buildAuthScope(), eliminating collisions like user='a:b',pwd='c' vs user='a',pwd='b:c'.
[P2] DSN parsing @ in query params — Fixed. The parser now isolates the authority section before / and ? prior to looking for @, so bearer_token=a@b is handled correctly. Test coverage confirms.
[P2] Pool key missing retry params — Intentional design. Retry parameters are excluded from the pool key; instead, refreshRetryConfig() updates retry settings when reusing pooled connections. This avoids unnecessary pool fragmentation.
[P2] Session recovery default DB — Fixed. normalizeConnectedDatabase() returns information_schema when no explicit DB is provided for SQL paths. recoverSqlSessionContext() uses this during reconnect.

Additional Verification

Reconnect lock: _reconnectLock correctly deduplicates concurrent reconnect triggers.
Inflight tracking: Requests are properly removed on resolve/reject; all inflight requests are failed and cleared when reconnect fails.
Address selection: Least-connected strategy with proper increment/decrement lifecycle.
No memory leaks: Callbacks auto-unregister, inflight store clears on close/failure.

LGTM ✅

qevolg added 9 commits March 13, 2026 10:03

feat: add dsn parser

135abe6

docs: add design doc

2413adb

feat: update failover logic to block new requests during reconnection

957371c

feat: enhance reconnection logic with concurrency control and state m…

cfa6322

…anagement

feat: implement optimistic send strategy for inflight requests during…

89bd2f3

… reconnection

feat: add concurrency control for WebSocket connection pool using asy…

b915ab1

…nc-mutex

feat: improve reconnect lock handling to prevent premature release

c0d04e7

feat: increase default timeout for WebSocket connection to improve st…

e9313a7

…ability

feat: enhance failover design with multi-address support and improved…

cf1d209

… request handling

This comment was marked as outdated.

Sign in to view

qevolg added 18 commits March 14, 2026 21:34

ai: add CLAUDE.md

257ca0e

feat: Refactor WebSocketConnector to support multiple addresses and r…

321ea52

…etry logic

feat: implement session recovery and reconnect handling in WsClient a…

8485542

…nd WebSocketConnector

fix: correct parameter type for sendBinaryMsg in WsStmt2 class

ad1d366

refactor: comment out connection readiness check in sendMsgDirect method

93af628

fix: update error code expectation in Tmq() test case

be156b9

fix: update token retrieval method in checkAuth and change database n…

af116ff

…ame in schemaless test

feat: add cloud service host detection and default port handling in D…

69940a2

…SN parsing

feat: enhance WebSocketConnector with improved retry logic and connec…

e13c647

…tion handling

feat: integrate RetryConfig into WebSocketConnector and update relate…

cac538b

…d tests

chore: remove CLAUDE.md guidance file

3846431

feat: refactor getDsn function to improve readability and structure

f1dbb03

feat: rename normalizeWsPath to normalizePath and update references a…

efb5ffc

…cross the codebase

feat: refactor WebSocketConnector to unify connection handling and im…

6add776

…prove logging

feat: simplify connection handling in WsClient and rename reconnect m…

03f26bf

…ethod in WebSocketConnector

feat: update retry configuration in reconnect tests and rename reconn…

0b85c35

…ect method in WebSocketConnector

feat: simplify message handling in sendMsgDirect by removing connecti…

09c6538

…on check and renaming variables

feat: enhance inflight request management in WebSocketConnector with …

e69e5d3

…ordered tracking and replay functionality

Copilot AI review requested due to automatic review settings March 26, 2026 03:21

Copilot started reviewing on behalf of qevolg March 26, 2026 03:22 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

qevolg added 2 commits March 26, 2026 11:34

feat: decode URL-encoded credentials in TMQ subscribe message and add…

0936dc0

… corresponding test

feat: move parseNonNegativeInt and parsePositiveInt functions to wsCo…

762b583

…nnector for better organization

Copilot AI review requested due to automatic review settings March 26, 2026 03:44

Copilot started reviewing on behalf of qevolg March 26, 2026 03:45 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

nodejs/src/client/wsConnector.ts Outdated Show resolved Hide resolved

qevolg added 3 commits March 26, 2026 13:40

feat: update picomatch versions in package-lock.json for improved sta…

1148912

…bility

feat: enhance reconnect logic and add tests for aborting retries on c…

ef8dfb9

…lose

feat: update picomatch package URLs to use npm registry for consistency

ac14bfa

Copilot AI review requested due to automatic review settings March 26, 2026 06:44

Copilot started reviewing on behalf of qevolg March 26, 2026 06:45 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

nodejs/src/client/wsClient.ts Show resolved Hide resolved

qevolg added 2 commits March 26, 2026 15:13

feat: improve callback registration handling during connection closur…

78f63c3

…e in WebSocketConnector

feat: update package version to 3.3.0 in package.json and package-loc…

5c2cebe

…k.json

Copilot AI review requested due to automatic review settings March 26, 2026 07:59

Copilot started reviewing on behalf of qevolg March 26, 2026 07:59 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

nodejs/src/client/wsConnectorPool.ts Show resolved Hide resolved

nodejs/src/tmq/wsTmq.ts Show resolved Hide resolved

nodejs/src/tmq/config.ts Outdated Show resolved Hide resolved

feat: improve error message for invalid WebSocket configuration in Ws…

9be7e1b

…Consumer

feat: enhance session recovery and retry configuration handling in We…

4ffd2bc

…bSocket client

Copilot AI review requested due to automatic review settings March 27, 2026 06:06

Copilot started reviewing on behalf of qevolg March 27, 2026 06:06 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

nodejs/src/client/wsConnector.ts Show resolved Hide resolved

nodejs/src/client/wsConnector.ts Show resolved Hide resolved

feat: update allowlist for audit check in GitHub Actions workflow

7ad34cc

sheyanjie-qq approved these changes Mar 27, 2026

View reviewed changes

zitsen approved these changes Mar 28, 2026

View reviewed changes

zitsen merged commit 40cf375 into main Mar 28, 2026
9 of 10 checks passed

Conversation

qevolg commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue(s)

Checklist

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sheyanjie-qq commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

zitsen left a comment

Choose a reason for hiding this comment

Code Review Summary

Previously Reported Issues — All Resolved

Additional Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qevolg commented Mar 14, 2026 •

edited

Loading

codecov-commenter commented Mar 14, 2026 •

edited

Loading