Skip to content

Commit e49179c

Browse files
authored
[ML] Fix double-counting of inference memory in the assignment rebalancer (#133919)
The static method TrainedModelAssignmentRebalancer.getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference was used to subtract load.getAssignedNativeInferenceMemory() from load.getFreeMemoryExcludingPerNodeOverhead(). However, in NodeLoad.getFreeMemoryExcludingPerNodeOverhead(), native inference memory was already subtracted as part of the getAssignedJobMemoryExcludingPerNodeOverhead() calculation. This led to double-counting of the native inference memory. Avoiding this double-counting allows us to remove the private method getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference() entirely.
1 parent 7d3903f commit e49179c

File tree

2 files changed

+6
-7
lines changed

2 files changed

+6
-7
lines changed

docs/changelog/133919.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 133919
2+
summary: Fix double-counting of inference memory in the assignment rebalancer
3+
area: Machine Learning
4+
type: bug
5+
issues: []

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentRebalancer.java

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -298,9 +298,7 @@ private Map<List<String>, List<AssignmentPlan.Node>> createNodesByZoneMap() {
298298
nodes.add(
299299
new AssignmentPlan.Node(
300300
discoveryNode.getId(),
301-
// We subtract native inference memory as the planner expects available memory for
302-
// native inference including current assignments.
303-
getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference(load),
301+
load.getFreeMemoryExcludingPerNodeOverhead(),
304302
MlProcessors.get(discoveryNode, allocatedProcessorsScale).roundUp()
305303
)
306304
);
@@ -317,10 +315,6 @@ private Map<List<String>, List<AssignmentPlan.Node>> createNodesByZoneMap() {
317315
}));
318316
}
319317

320-
private static long getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference(NodeLoad load) {
321-
return load.getFreeMemoryExcludingPerNodeOverhead() - load.getAssignedNativeInferenceMemory();
322-
}
323-
324318
private TrainedModelAssignmentMetadata.Builder buildAssignmentsFromPlan(AssignmentPlan assignmentPlan) {
325319
TrainedModelAssignmentMetadata.Builder builder = TrainedModelAssignmentMetadata.Builder.empty();
326320
for (AssignmentPlan.Deployment deployment : assignmentPlan.deployments()) {

0 commit comments

Comments
 (0)