[AINode] Concurrent inference bug fix #16595

CRZbulabula · 2025-10-16T02:23:15Z

This PR improves the core codes of AINode. We should distinguish the usage of model_id and model_type, thus we pass model_info.

Copilot

Pull Request Overview

This PR fixes a concurrent inference bug in AINode by refactoring how model information is passed and used throughout the system. The key change is transitioning from passing model_id strings to passing complete ModelInfo objects, which properly distinguishes between model_id and model_type.

Key changes:

Removed model validation and information retrieval from the analysis phase, deferring it to runtime
Changed pool scheduler and inference pool to accept ModelInfo objects instead of model_id strings
Fixed model type checking to use model_type from ModelInfo instead of comparing model_id directly

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
ForecastTableFunction.java	Removed early model validation and replaced `maxInputLength` with `Integer.MAX_VALUE`
ModelInferenceDescriptor.java	Changed constructor to remove `ModelInformation` parameter, making it mutable
ModelFetcher.java	Simplified model fetching to only return AINode endpoint without model information
AnalyzeVisitor.java	Commented out window size validation checks
ModelInfo.java	Removed serialization of model information in response
ModelManager.java	Simplified `getModelInfo` to always return first registered AINode
GetModelInfoResp.java	Removed model information serialization fields
model_storage.py	Added `get_model_info` method to retrieve model information
model_manager.py	Added `get_model_info` method wrapper
inference_manager.py	Changed to use `model_type` from `ModelInfo` instead of comparing `model_id`
basic_pool_scheduler.py	Changed to accept and use `ModelInfo` objects instead of `model_id` strings
abstract_pool_scheduler.py	Updated interface to accept `ModelInfo` instead of `model_id`
pool_group.py	Added `get_running_pool_count` method to count running pools
pool_controller.py	Updated to pass `ModelInfo` objects and use `model_type` for comparisons
inference_request_pool.py	Changed to store and use `ModelInfo` instead of `model_id`, fixed output tensor dimension
AINodeInferenceSQLIT.java	Commented out test methods
AINodeConcurrentInferenceIT.java	Added test annotations and fixed SQL column name from `timestamp` to `time`

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

...java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/ForecastTableFunction.java

iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/ModelManager.java

iotdb-core/ainode/iotdb/ainode/core/inference/pool_controller.py

Copilot · 2025-10-16T10:24:46Z

iotdb-core/ainode/iotdb/ainode/core/inference/pool_scheduler/basic_pool_scheduler.py

+        if len(existing_model_infos) > 0:
+            allocation_result[model_info.model_id] = 0


The allocation result for the unloaded model is set to 0 after already being computed in the conditional expression. This overwrites the computed value and may cause incorrect behavior. Lines 156-157 should be removed as the logic at lines 147-155 already handles this case correctly.

Suggested change

if len(existing_model_infos) > 0:

allocation_result[model_info.model_id] = 0

Co-authored-by: Copilot <[email protected]>

codecov · 2025-10-16T14:19:41Z

Codecov Report

❌ Patch coverage is 8.33333% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.67%. Comparing base (2a12077) to head (1c09df3).
⚠️ Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
.../apache/iotdb/confignode/manager/ModelManager.java	0.00%	8 Missing ⚠️
...otdb/db/queryengine/plan/analyze/ModelFetcher.java	0.00%	2 Missing ⚠️
...plan/parameter/model/ModelInferenceDescriptor.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #16595      +/-   ##
============================================
+ Coverage     38.66%   38.67%   +0.01%     
  Complexity      207      207              
============================================
  Files          4937     4937              
  Lines        327011   326949      -62     
  Branches      41496    41495       -1     
============================================
+ Hits         126434   126448      +14     
+ Misses       200577   200501      -76

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sonarqubecloud · 2025-10-16T16:15:45Z

Quality Gate passed

Issues
9 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

(cherry picked from commit 46a0c6a)

* [AINode] Refactor code base * [AINode] Implement concurrent inference framework (#16311) (cherry picked from commit 7b9ec7e) * [AINode] Fix bugs for SHOW LOADED MODELS (#16410) (cherry picked from commit 40b2b33) * [AINode] Add a batcher for inference (#16411) (cherry picked from commit 7734331) * [AINode][Bug fix] Concurrent inference (#16518) * trigger CI * bug fix 4 show loaded models (cherry picked from commit b4dde12) * [AINode] Concurrent inference bug fix (#16595) (cherry picked from commit 46a0c6a) * [AINode] Adjust the maximum inference input length (#16640) (cherry picked from commit 2c9064f) * [AINode] Fix bug of sundial and forecast udf (#16768) (cherry picked from commit 2b47be7) * [AINode] Package AINode via PyInstaller (#16707) (cherry picked from commit 49c625b) * [AINode] Enable AINode start as background (-d) (#16762) (cherry picked from commit 1ebb951) * [AINode] Update AINodeClient for DataNode to borrow (#16647) (cherry picked from commit d49d7dd) * [AINode] Fix bug that AINode cannot compile in Windows (#16767) (cherry picked from commit cd443ba) * [AINode] Delete poetry.lock for easier maintain different operating systems (#16793) (cherry picked from commit 50f92e4) * [AINode] Fix cp errors --------- Co-authored-by: Leo <[email protected]> Co-authored-by: jtmer <[email protected]> Co-authored-by: Zeyu Zhang <[email protected]>

(cherry picked from commit 46a0c6a)

CRZbulabula added 2 commits October 16, 2025 13:54

Seems finish

2e03f8c

stop call inference CI

3b845e6

CRZbulabula force-pushed the ain-concurrent-inference-bug-fix branch from ec9a507 to 3b845e6 Compare October 16, 2025 10:22

CRZbulabula requested a review from Copilot October 16, 2025 10:23

Copilot AI reviewed Oct 16, 2025

View reviewed changes

CRZbulabula and others added 3 commits October 16, 2025 18:26

Update iotdb-core/ainode/iotdb/ainode/core/inference/pool_controller.py

6dcc1f0

Co-authored-by: Copilot <[email protected]>

Fix CI

46aff79

Fix IT

44f5987

CRZbulabula added 3 commits October 16, 2025 22:39

Update AINodeConcurrentInferenceIT.java

e5db092

Update AINodeConcurrentInferenceIT.java

cc4aef7

Update AINodeConcurrentInferenceIT.java

1c09df3

CRZbulabula merged commit 46a0c6a into master Oct 17, 2025
32 checks passed

CRZbulabula deleted the ain-concurrent-inference-bug-fix branch October 17, 2025 00:53

CRZbulabula added a commit that referenced this pull request Dec 16, 2025

[AINode] Concurrent inference bug fix (#16595)

fd34d93

(cherry picked from commit 46a0c6a)

CRZbulabula added a commit that referenced this pull request Dec 16, 2025

[AINode] Concurrent inference bug fix (#16595)

68e44cc

(cherry picked from commit 46a0c6a)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AINode] Concurrent inference bug fix #16595

[AINode] Concurrent inference bug fix #16595

Uh oh!

CRZbulabula commented Oct 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 16, 2025

Uh oh!

CRZbulabula Oct 16, 2025

Uh oh!

codecov bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if len(existing_model_infos) > 0:
		allocation_result[model_info.model_id] = 0

[AINode] Concurrent inference bug fix #16595

[AINode] Concurrent inference bug fix #16595

Uh oh!

Conversation

CRZbulabula commented Oct 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

CRZbulabula Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sonarqubecloud bot commented Oct 16, 2025

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Oct 16, 2025 •

edited

Loading