Skip to content

Conversation

@Alchuang22-dev
Copy link
Contributor

Description

This PR refactors the AINode client infrastructure to support direct communication between DataNode and AINode, removing the dependency on ConfigNode for AI-related operations such as model loading and inference. Should be reviewed by @CRZbulabula

Contents

AINodeClient

  • Added a new executeRemoteCallWithRetry() method for automatic retry and reconnection on Thrift transport failures, following the same design pattern as ConfigNodeClient.

  • Updated the loadModel(TLoadModelReq req) API to use this retry wrapper for improved resilience.

  • Simplified connection lifecycle management (init(), close()) to ensure stable client reuse via AINodeClientManager.

ClusterConfigTaskExecutor

  • Replaced indirect ConfigNode RPCs with direct calls to AINodeClientManager.borrowClient(TEndPoint) for model operations (currently loadModel as an example).

  • Ensured the DataNode→AINode invocation flow mirrors the ConfigNode client style while maintaining compatibility with existing client pooling.

  • Updated Thrift imports to use org.apache.iotdb.ainode.rpc.thrift.* instead of org.apache.iotdb.confignode.rpc.thrift.*.

AINodeClientManager

  • No functional changes; reused existing pool management for TEndPoint-based clients to keep consistency with ConfigNodeClientManager.

Impact

DataNode can now directly send AI-related requests (e.g., model load/unload, inference) to AINode without routing through ConfigNode.

Next Steps

Extend the same direct invocation pattern (AINodeClientManager.borrowClient()) to other AI APIs:
unloadModel, showModel, showLoadedModel, showAIDevices, createTraining, and getModelInfo.


This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

As former.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

@RkGrit
Copy link
Contributor

RkGrit commented Nov 17, 2025

LGTM~

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!!

@CRZbulabula CRZbulabula merged commit d49d7dd into apache:master Nov 17, 2025
28 checks passed
JackieTien97 pushed a commit that referenced this pull request Nov 26, 2025
alpass163 pushed a commit to alpass163/iotdb that referenced this pull request Nov 28, 2025
CRZbulabula pushed a commit that referenced this pull request Dec 16, 2025
CRZbulabula added a commit that referenced this pull request Dec 16, 2025
* [AINode] Refactor code base

* [AINode] Implement concurrent inference framework (#16311)

(cherry picked from commit 7b9ec7e)

* [AINode] Fix bugs for SHOW LOADED MODELS (#16410)

(cherry picked from commit 40b2b33)

* [AINode] Add a batcher for inference (#16411)

(cherry picked from commit 7734331)

* [AINode][Bug fix] Concurrent inference (#16518)

* trigger CI

* bug fix 4 show loaded models

(cherry picked from commit b4dde12)

* [AINode] Concurrent inference bug fix (#16595)

(cherry picked from commit 46a0c6a)

* [AINode] Adjust the maximum inference input length (#16640)

(cherry picked from commit 2c9064f)

* [AINode] Fix bug of sundial and forecast udf (#16768)

(cherry picked from commit 2b47be7)

* [AINode] Package AINode via PyInstaller (#16707)

(cherry picked from commit 49c625b)

* [AINode] Enable AINode start as background (-d) (#16762)

(cherry picked from commit 1ebb951)

* [AINode] Update AINodeClient for DataNode to borrow (#16647)

(cherry picked from commit d49d7dd)

* [AINode] Fix bug that AINode cannot compile in Windows (#16767)

(cherry picked from commit cd443ba)

* [AINode] Delete poetry.lock for easier maintain different operating systems (#16793)

(cherry picked from commit 50f92e4)

* [AINode] Fix cp errors

---------

Co-authored-by: Leo <[email protected]>
Co-authored-by: jtmer <[email protected]>
Co-authored-by: Zeyu Zhang <[email protected]>
CRZbulabula pushed a commit that referenced this pull request Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants