Skip to content

Commit 8ebc6ae

Browse files
authored
[ML] Fix double-counting of inference memory in the assignment rebalancer (#133919) (#134054)
The static method TrainedModelAssignmentRebalancer.getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference was used to subtract load.getAssignedNativeInferenceMemory() from load.getFreeMemoryExcludingPerNodeOverhead(). However, in NodeLoad.getFreeMemoryExcludingPerNodeOverhead(), native inference memory was already subtracted as part of the getAssignedJobMemoryExcludingPerNodeOverhead() calculation. This led to double-counting of the native inference memory. Avoiding this double-counting allows us to remove the private method getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference() entirely.
1 parent 68db9d1 commit 8ebc6ae

File tree

2 files changed

+6
-7
lines changed

2 files changed

+6
-7
lines changed

docs/changelog/133919.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 133919
2+
summary: Fix double-counting of inference memory in the assignment rebalancer
3+
area: Machine Learning
4+
type: bug
5+
issues: []

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentRebalancer.java

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -299,9 +299,7 @@ private Map<List<String>, List<AssignmentPlan.Node>> createNodesByZoneMap() {
299299
nodes.add(
300300
new AssignmentPlan.Node(
301301
discoveryNode.getId(),
302-
// We subtract native inference memory as the planner expects available memory for
303-
// native inference including current assignments.
304-
getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference(load),
302+
load.getFreeMemoryExcludingPerNodeOverhead(),
305303
MlProcessors.get(discoveryNode, allocatedProcessorsScale).roundUp()
306304
)
307305
);
@@ -318,10 +316,6 @@ private Map<List<String>, List<AssignmentPlan.Node>> createNodesByZoneMap() {
318316
}));
319317
}
320318

321-
private static long getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference(NodeLoad load) {
322-
return load.getFreeMemoryExcludingPerNodeOverhead() - load.getAssignedNativeInferenceMemory();
323-
}
324-
325319
private TrainedModelAssignmentMetadata.Builder buildAssignmentsFromPlan(AssignmentPlan assignmentPlan) {
326320
TrainedModelAssignmentMetadata.Builder builder = TrainedModelAssignmentMetadata.Builder.empty();
327321
for (AssignmentPlan.Deployment deployment : assignmentPlan.models()) {

0 commit comments

Comments
 (0)