Improve memory planning for submodule hierarchies. #11860

hsharma35 · 2025-06-23T21:21:13Z

Summary:
Improves the memory planning across hierarchies in apply_algo in memory_planning.py:

Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates.
Allocate max bufsize for all submodules as graph_module.meta['input_mem_buffer_sizes'], rather than sum. This allows us to reclaim the space used by one submodule for another submodule.

Before this change the apply_algo in memory_planning.py would:

Plan memory top-to-bottom, starting with the top-level graph module (root).
Populate the input_mem_buffer_sizes so that each new submodule will allocate memory after the max buffer size of previous memory.

For example:

root [A bytes]
- root.child0 [B bytes]
   - root.child0.child0 [C bytes]
- root.child1 [D bytes]

(before this diff) Planned memory looks like:

--- A + B + C + D ----------------
Space for root.child1
--- A + B + C --------------------
Space for root.child0.child0
--- A + B ------------------------
Space for root.child0
--- A ----------------------------
Space for root
--- 0 ----------------------------

Note that tensors for child0 and child1 have no overlap but still use completely different space.

(after this diff) Planned memory looks like:

--- max(C + B, D) + A ----------
root
--- max(C + B, D) --------------
root.child0        |
--- C ------------ | root.child1
root.child0.child0 | 
--- 0 --------------------------

Note:
We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use graph_module.meta['non_const_buffer_size'] and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for root.child0.child0 in root.child0, and space for root.child0/root.child1 in `root.

Differential Revision: D76940237

pytorch-bot · 2025-06-23T21:21:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11860

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit b3a3332 with merge base 7e28a04 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-models-linux (mobilebert, portable, linux.2xlarge) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-models-linux (mobilebert, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-06-23T21:21:50Z

This pull request was exported from Phabricator. Differential Revision: D76940237

Summary: Improves the memory planning across hierarchies in apply_algo in memory_planning.py: 1. Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates. 2. Allocate max bufsize for all submodules as `graph_module.meta['input_mem_buffer_sizes']`, rather than sum. This allows us to reclaim the space used by one submodule for another submodule. Before this change the apply_algo in memory_planning.py would: 1. Plan memory top-to-bottom, starting with the top-level graph module (root). 2. Populate the `input_mem_buffer_sizes` so that each new submodule will allocate memory after the max buffer size of previous memory. For example: ``` root [A bytes] - root.child0 [B bytes] - root.child0.child0 [C bytes] - root.child1 [D bytes] ``` (before this diff) Planned memory looks like: ``` --- A + B + C + D ---------------- Space for root.child1 --- A + B + C -------------------- Space for root.child0.child0 --- A + B ------------------------ Space for root.child0 --- A ---------------------------- Space for root --- 0 ---------------------------- ``` Note that tensors for child0 and child1 have no overlap but still use completely different space. (after this diff) Planned memory looks like: ``` --- max(C + B, D) + A ---------- root --- max(C + B, D) -------------- root.child0 | --- C ------------ | root.child1 root.child0.child0 | --- 0 -------------------------- ``` Note: We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use `graph_module.meta['non_const_buffer_size']` and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for `root.child0.child0` in `root.child0`, and space for `root.child0`/`root.child1` in `root. Differential Revision: D76940237

facebook-github-bot · 2025-06-24T01:25:23Z

This pull request was exported from Phabricator. Differential Revision: D76940237

Summary: Improves the memory planning across hierarchies in apply_algo in memory_planning.py: 1. Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates. 2. Allocate max bufsize for all submodules as `graph_module.meta['input_mem_buffer_sizes']`, rather than sum. This allows us to reclaim the space used by one submodule for another submodule. Before this change the apply_algo in memory_planning.py would: 1. Plan memory top-to-bottom, starting with the top-level graph module (root). 2. Populate the `input_mem_buffer_sizes` so that each new submodule will allocate memory after the max buffer size of previous memory. For example: ``` root [A bytes] - root.child0 [B bytes] - root.child0.child0 [C bytes] - root.child1 [D bytes] ``` (before this diff) Planned memory looks like: ``` --- A + B + C + D ---------------- Space for root.child1 --- A + B + C -------------------- Space for root.child0.child0 --- A + B ------------------------ Space for root.child0 --- A ---------------------------- Space for root --- 0 ---------------------------- ``` Note that tensors for child0 and child1 have no overlap but still use completely different space. (after this diff) Planned memory looks like: ``` --- max(C + B, D) + A ---------- root --- max(C + B, D) -------------- root.child0 | --- C ------------ | root.child1 root.child0.child0 | --- 0 -------------------------- ``` Note: We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use `graph_module.meta['non_const_buffer_size']` and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for `root.child0.child0` in `root.child0`, and space for `root.child0`/`root.child1` in `root. Differential Revision: D76940237

facebook-github-bot · 2025-06-27T22:15:25Z

This pull request was exported from Phabricator. Differential Revision: D76940237

Summary: Pull Request resolved: pytorch#11860 Improves the memory planning across hierarchies in apply_algo in memory_planning.py: 1. Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates. 2. Allocate max bufsize for all submodules as `graph_module.meta['input_mem_buffer_sizes']`, rather than sum. This allows us to reclaim the space used by one submodule for another submodule. Before this change the apply_algo in memory_planning.py would: 1. Plan memory top-to-bottom, starting with the top-level graph module (root). 2. Populate the `input_mem_buffer_sizes` so that each new submodule will allocate memory after the max buffer size of previous memory. For example: ``` root [A bytes] - root.child0 [B bytes] - root.child0.child0 [C bytes] - root.child1 [D bytes] ``` (before this diff) Planned memory looks like: ``` --- A + B + C + D ---------------- Space for root.child1 --- A + B + C -------------------- Space for root.child0.child0 --- A + B ------------------------ Space for root.child0 --- A ---------------------------- Space for root --- 0 ---------------------------- ``` Note that tensors for child0 and child1 have no overlap but still use completely different space. (after this diff) Planned memory looks like: ``` --- max(C + B, D) + A ---------- root --- max(C + B, D) -------------- root.child0 | --- C ------------ | root.child1 root.child0.child0 | --- 0 -------------------------- ``` Note: We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use `graph_module.meta['non_const_buffer_size']` and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for `root.child0.child0` in `root.child0`, and space for `root.child0`/`root.child1` in `root. Differential Revision: D76940237

JacobSzwejbka

The overall changes seem fine, you will need to fix the failing tests that have asserts on specific memory planning outcomes though.

It would also be useful to have some rough stats over a few models over any improvements we see.

Summary: Improves the memory planning across hierarchies in apply_algo in memory_planning.py: 1. Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates. 2. Allocate max bufsize for all submodules as `graph_module.meta['input_mem_buffer_sizes']`, rather than sum. This allows us to reclaim the space used by one submodule for another submodule. Before this change the apply_algo in memory_planning.py would: 1. Plan memory top-to-bottom, starting with the top-level graph module (root). 2. Populate the `input_mem_buffer_sizes` so that each new submodule will allocate memory after the max buffer size of previous memory. For example: ``` root [A bytes] - root.child0 [B bytes] - root.child0.child0 [C bytes] - root.child1 [D bytes] ``` (before this diff) Planned memory looks like: ``` --- A + B + C + D ---------------- Space for root.child1 --- A + B + C -------------------- Space for root.child0.child0 --- A + B ------------------------ Space for root.child0 --- A ---------------------------- Space for root --- 0 ---------------------------- ``` Note that tensors for child0 and child1 have no overlap but still use completely different space. (after this diff) Planned memory looks like: ``` --- max(C + B, D) + A ---------- root --- max(C + B, D) -------------- root.child0 | --- C ------------ | root.child1 root.child0.child0 | --- 0 -------------------------- ``` Note: We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use `graph_module.meta['non_const_buffer_size']` and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for `root.child0.child0` in `root.child0`, and space for `root.child0`/`root.child1` in `root. Reviewed By: JacobSzwejbka Differential Revision: D76940237

facebook-github-bot · 2025-07-01T03:50:43Z

This pull request was exported from Phabricator. Differential Revision: D76940237

Summary: Improves the memory planning across hierarchies in apply_algo in memory_planning.py: 1. Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates. 2. Allocate max bufsize for all submodules as `graph_module.meta['input_mem_buffer_sizes']`, rather than sum. This allows us to reclaim the space used by one submodule for another submodule. Before this change the apply_algo in memory_planning.py would: 1. Plan memory top-to-bottom, starting with the top-level graph module (root). 2. Populate the `input_mem_buffer_sizes` so that each new submodule will allocate memory after the max buffer size of previous memory. For example: ``` root [A bytes] - root.child0 [B bytes] - root.child0.child0 [C bytes] - root.child1 [D bytes] ``` (before this diff) Planned memory looks like: ``` --- A + B + C + D ---------------- Space for root.child1 --- A + B + C -------------------- Space for root.child0.child0 --- A + B ------------------------ Space for root.child0 --- A ---------------------------- Space for root --- 0 ---------------------------- ``` Note that tensors for child0 and child1 have no overlap but still use completely different space. (after this diff) Planned memory looks like: ``` --- max(C + B, D) + A ---------- root --- max(C + B, D) -------------- root.child0 | --- C ------------ | root.child1 root.child0.child0 | --- 0 -------------------------- ``` Note: We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use `graph_module.meta['non_const_buffer_size']` and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for `root.child0.child0` in `root.child0`, and space for `root.child0`/`root.child1` in `root. Reviewed By: JacobSzwejbka Differential Revision: D76940237

facebook-github-bot · 2025-07-01T18:28:38Z

This pull request was exported from Phabricator. Differential Revision: D76940237

Summary: Pull Request resolved: pytorch#11860 Improves the memory planning across hierarchies in apply_algo in memory_planning.py: 1. Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates. 2. Allocate max bufsize for all submodules as `graph_module.meta['input_mem_buffer_sizes']`, rather than sum. This allows us to reclaim the space used by one submodule for another submodule. Before this change the apply_algo in memory_planning.py would: 1. Plan memory top-to-bottom, starting with the top-level graph module (root). 2. Populate the `input_mem_buffer_sizes` so that each new submodule will allocate memory after the max buffer size of previous memory. For example: ``` root [A bytes] - root.child0 [B bytes] - root.child0.child0 [C bytes] - root.child1 [D bytes] ``` (before this diff) Planned memory looks like: ``` --- A + B + C + D ---------------- Space for root.child1 --- A + B + C -------------------- Space for root.child0.child0 --- A + B ------------------------ Space for root.child0 --- A ---------------------------- Space for root --- 0 ---------------------------- ``` Note that tensors for child0 and child1 have no overlap but still use completely different space. (after this diff) Planned memory looks like: ``` --- max(C + B, D) + A ---------- root --- max(C + B, D) -------------- root.child0 | --- C ------------ | root.child1 root.child0.child0 | --- 0 -------------------------- ``` Note: We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use `graph_module.meta['non_const_buffer_size']` and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for `root.child0.child0` in `root.child0`, and space for `root.child0`/`root.child1` in `root. Reviewed By: JacobSzwejbka Differential Revision: D76940237

Summary: Improves the memory planning across hierarchies in apply_algo in memory_planning.py: 1. Plan memory bottom-to-top, starting with the leaf submodules and ending at top-level graph module (root). This is now consistent with how delegates are compiled / memory planned. Future PRs/diffs will add support for planned buffers in delegates. 2. Allocate max bufsize for all submodules as `graph_module.meta['input_mem_buffer_sizes']`, rather than sum. This allows us to reclaim the space used by one submodule for another submodule. Before this change the apply_algo in memory_planning.py would: 1. Plan memory top-to-bottom, starting with the top-level graph module (root). 2. Populate the `input_mem_buffer_sizes` so that each new submodule will allocate memory after the max buffer size of previous memory. For example: ``` root [A bytes] - root.child0 [B bytes] - root.child0.child0 [C bytes] - root.child1 [D bytes] ``` (before this diff) Planned memory looks like: ``` --- A + B + C + D ---------------- Space for root.child1 --- A + B + C -------------------- Space for root.child0.child0 --- A + B ------------------------ Space for root.child0 --- A ---------------------------- Space for root --- 0 ---------------------------- ``` Note that tensors for child0 and child1 have no overlap but still use completely different space. (after this diff) Planned memory looks like: ``` --- max(C + B, D) + A ---------- root --- max(C + B, D) -------------- root.child0 | --- C ------------ | root.child1 root.child0.child0 | --- 0 -------------------------- ``` Note: We can update memory planning algo to plan nodes with submodules (while/map/cond or even delegate) to use `graph_module.meta['non_const_buffer_size']` and reduce space even further. Implementation for this is not part of this PR/Diff. This will allow us to reuse space for `root.child0.child0` in `root.child0`, and space for `root.child0`/`root.child1` in `root. Reviewed By: JacobSzwejbka Differential Revision: D76940237

facebook-github-bot · 2025-07-01T19:53:05Z

This pull request was exported from Phabricator. Differential Revision: D76940237

Differential Revision: D76940237 Pull Request resolved: pytorch#11860

hsharma35 requested review from JacobSzwejbka and larryliu0820 as code owners June 23, 2025 21:21

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2025

facebook-github-bot added the fb-exported label Jun 23, 2025

hsharma35 force-pushed the export-D76940237 branch from 98bbc9b to 9da6f8c Compare June 24, 2025 01:25

hsharma35 force-pushed the export-D76940237 branch from 9da6f8c to f632fa6 Compare June 27, 2025 22:06

hsharma35 requested a review from lucylq as a code owner June 27, 2025 22:06

hsharma35 requested a review from swolchok as a code owner June 27, 2025 22:06

hsharma35 force-pushed the export-D76940237 branch from f632fa6 to 446c731 Compare June 27, 2025 22:15

hsharma35 added the release notes: none Do not include this in the release notes label Jun 30, 2025

JacobSzwejbka approved these changes Jun 30, 2025

View reviewed changes

JacobSzwejbka added release notes: exir Changes to any dialects and passes on these dialects, such as memory planning and removed release notes: none Do not include this in the release notes labels Jun 30, 2025

hsharma35 force-pushed the export-D76940237 branch from 446c731 to 6633d0c Compare July 1, 2025 03:50

hsharma35 force-pushed the export-D76940237 branch from 6633d0c to b842262 Compare July 1, 2025 18:24

hsharma35 force-pushed the export-D76940237 branch from b842262 to 5c7cd46 Compare July 1, 2025 18:28

hsharma35 force-pushed the export-D76940237 branch from 5c7cd46 to b3a3332 Compare July 1, 2025 19:52

facebook-github-bot merged commit cb3b99a into pytorch:main Jul 1, 2025
94 of 98 checks passed

BujSet pushed a commit to BujSet/executorch that referenced this pull request Jul 2, 2025

Improve memory planning for submodule hierarchies.

c569bdb

Differential Revision: D76940237 Pull Request resolved: pytorch#11860

Tanish2101 pushed a commit to Tanish2101/executorch that referenced this pull request Jul 9, 2025

Improve memory planning for submodule hierarchies.

0b9e476

Differential Revision: D76940237 Pull Request resolved: pytorch#11860

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve memory planning for submodule hierarchies. #11860

Improve memory planning for submodule hierarchies. #11860

Uh oh!

hsharma35 commented Jun 23, 2025

Uh oh!

pytorch-bot bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

facebook-github-bot commented Jun 24, 2025

Uh oh!

facebook-github-bot commented Jun 27, 2025

Uh oh!

JacobSzwejbka left a comment

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve memory planning for submodule hierarchies. #11860

Improve memory planning for submodule hierarchies. #11860

Uh oh!

Conversation

hsharma35 commented Jun 23, 2025

Uh oh!

pytorch-bot bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11860

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

facebook-github-bot commented Jun 24, 2025

Uh oh!

facebook-github-bot commented Jun 27, 2025

Uh oh!

JacobSzwejbka left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Jun 23, 2025 •

edited

Loading