- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
[ML] Improve memory estimation methods accuracy in TrainedModelAssignmentRebalancer and related classes #133930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
7099062
              53a3262
              726bc98
              c2d6e10
              b7686c3
              05ab306
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| pr: 133930 | ||
| summary: Improve memory estimation methods accuracy in `TrainedModelAssignmentRebalancer` | ||
| and related classes | ||
| area: Machine Learning | ||
| type: bug | ||
| issues: [] | 
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -107,7 +107,7 @@ public long estimateMemoryUsageBytes(int allocations) { | |
| ); | ||
| } | ||
|  | ||
| long estimateAdditionalMemoryUsageBytes(int allocationsOld, int allocationsNew) { | ||
| public long estimateAdditionalMemoryUsageBytes(int allocationsOld, int allocationsNew) { | ||
| return StartTrainedModelDeploymentAction.estimateMemoryUsageBytes( | ||
| modelId, | ||
| memoryBytes, | ||
|  | @@ -308,7 +308,7 @@ private Quality computeQuality() { | |
| Node n = nodeAllocations.getKey(); | ||
| weighedAllocationsScore += (1 + 0.1 * (m.currentAllocationsByNodeId().containsKey(n.id()) ? 1 : 0)) * modelAssignments | ||
| .get(n); | ||
| memoryScore -= (nodeAllocations.getValue() > 0 ? m.memoryBytes() : 0); | ||
| memoryScore -= (nodeAllocations.getValue() > 0 ? m.estimateMemoryUsageBytes(nodeAllocations.getValue()) : 0); | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
 Because   | ||
| } | ||
| } | ||
| } | ||
|  | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assignmentsis aMap, right?So why not do
assignment.getOrDefault(node, 0)instead of streaming/filtering the key set?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assignmentsis a<Map<AssignmentPlan.Node, Integer>, while node is of typeDiscoveryNode. That's why I need to compare both id's.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks.
Just thinking out loud: shouldn't the return value of
assignmentPlan.assignmentsbe aMap<String, Integer>instead (the string being the node ID)? That sounds more useful. Is that a big refactoring?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssignmentPlan.assignments(deployment)is used in 10 places in the main code and in 100 places in the test code. We can check if we can refactor it, but it should be in a different PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I agree with that. Then please add a comment here about this Node vs DiscoveryNode and that it could benefit from refactoring (to key string node ID) and it lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #134030 so it won't get lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another refactor to consider is making the
explainAssignment()function part of theAssignmentPlanclass. The code here is trying to reverse engineer the planners decision making and it's easy to get out of sync.