Skip to content

Commit 9065670

Browse files
committed
[ML] Fix double-counting of inference memory in the assignment rebalancer (elastic#133919)
The static method TrainedModelAssignmentRebalancer.getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference was used to subtract load.getAssignedNativeInferenceMemory() from load.getFreeMemoryExcludingPerNodeOverhead(). However, in NodeLoad.getFreeMemoryExcludingPerNodeOverhead(), native inference memory was already subtracted as part of the getAssignedJobMemoryExcludingPerNodeOverhead() calculation. This led to double-counting of the native inference memory. Avoiding this double-counting allows us to remove the private method getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference() entirely.
1 parent d68976d commit 9065670

File tree

2 files changed

+6
-7
lines changed

2 files changed

+6
-7
lines changed

docs/changelog/133919.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 133919
2+
summary: Fix double-counting of inference memory in the assignment rebalancer
3+
area: Machine Learning
4+
type: bug
5+
issues: []

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentRebalancer.java

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -307,9 +307,7 @@ private Map<List<String>, List<AssignmentPlan.Node>> createNodesByZoneMap() {
307307
nodes.add(
308308
new AssignmentPlan.Node(
309309
discoveryNode.getId(),
310-
// We subtract native inference memory as the planner expects available memory for
311-
// native inference including current assignments.
312-
getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference(load),
310+
load.getFreeMemoryExcludingPerNodeOverhead(),
313311
MlProcessors.get(discoveryNode, allocatedProcessorsScale).roundUp()
314312
)
315313
);
@@ -326,10 +324,6 @@ private Map<List<String>, List<AssignmentPlan.Node>> createNodesByZoneMap() {
326324
}));
327325
}
328326

329-
private static long getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference(NodeLoad load) {
330-
return load.getFreeMemoryExcludingPerNodeOverhead() - load.getAssignedNativeInferenceMemory();
331-
}
332-
333327
private TrainedModelAssignmentMetadata.Builder buildAssignmentsFromPlan(AssignmentPlan assignmentPlan) {
334328
TrainedModelAssignmentMetadata.Builder builder = TrainedModelAssignmentMetadata.Builder.empty();
335329
for (AssignmentPlan.Deployment deployment : assignmentPlan.models()) {

0 commit comments

Comments
 (0)