Skip to content

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented Sep 25, 2025

Summary:
This diff backs FreeableBuffer with an int64_t, instead of a void*

Usecase

PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.

We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.

Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.

Potential concerns

  • Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small.
  • Increased runtime latency; calls to the existing void* API perform truncation checks now.

Alternatives

Why we choose this solution? Seems to be the least intrusive out of a number of solutions.

  1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist.
  2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating.
  3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from.
  4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface.

https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0

Differential Revision: D83007972

Copy link

pytorch-bot bot commented Sep 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14570

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 14be5a9 with merge base 0e74a17 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2025
@facebook-github-bot
Copy link
Contributor

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Comment on lines 177 to 178
FreeFn free_fn_;
FreeUInt64Fn free_uint64_fn_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like at most one free function can be set, and it needs to correspond to the contents of data_, right? if so, I would reorganize things like so:

struct PointerData {
  FreeFn free_fn_;
  const void* data_;
};

struct Int64Data {
  FreeUInt64Fn free_fn_;
  uint64_t data_;
};

std::variant<PointerData, Int64Data> data_;

this way we only increase the size of FreeableBuffer by 8 bytes instead of 16.

Copy link
Contributor Author

@lucylq lucylq Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is really helpful I'll try it.

lucylq added a commit to lucylq/executorch-1 that referenced this pull request Sep 26, 2025
Summary:

This diff backs FreeableBuffer with an int64_t, instead of a void*

## Usecase
PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.

We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.

Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.

## Potential concerns
* Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small.
* Increased runtime latency; calls to the existing void* API perform truncation checks now.

## Alternatives
Why we choose this solution? Seems to be the least intrusive out of a number of solutions.
1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However  void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist.
2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating.
3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from.
4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface.

https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0

Differential Revision: D83007972
@facebook-github-bot
Copy link
Contributor

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

lucylq added a commit to lucylq/executorch-1 that referenced this pull request Sep 26, 2025
Summary:

This diff backs FreeableBuffer with an int64_t, instead of a void*

## Usecase
PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.

We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.

Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.

## Potential concerns
* Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small.
* Increased runtime latency; calls to the existing void* API perform truncation checks now.

## Alternatives
Why we choose this solution? Seems to be the least intrusive out of a number of solutions.
1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However  void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist.
2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating.
3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from.
4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface.

https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0

Differential Revision: D83007972
@facebook-github-bot
Copy link
Contributor

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

lucylq added a commit to lucylq/executorch-1 that referenced this pull request Sep 26, 2025
Summary:

This diff backs FreeableBuffer with an int64_t, instead of a void*

## Usecase
PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.

We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.

Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.

## Potential concerns
* Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small.
* Increased runtime latency; calls to the existing void* API perform truncation checks now.

## Alternatives
Why we choose this solution? Seems to be the least intrusive out of a number of solutions.
1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However  void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist.
2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating.
3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from.
4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface.

https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0

Differential Revision: D83007972
@facebook-github-bot
Copy link
Contributor

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

lucylq added a commit to lucylq/executorch-1 that referenced this pull request Sep 26, 2025
Summary:

This diff backs FreeableBuffer with an int64_t, instead of a void*

## Usecase
PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.

We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.

Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.

## Potential concerns
* Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small.
* Increased runtime latency; calls to the existing void* API perform truncation checks now.

## Alternatives
Why we choose this solution? Seems to be the least intrusive out of a number of solutions.
1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However  void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist.
2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating.
3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from.
4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface.

https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0

Differential Revision: D83007972
@facebook-github-bot
Copy link
Contributor

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating diff in D83007972.

lucylq added a commit to lucylq/executorch-1 that referenced this pull request Oct 1, 2025
Summary:

This diff backs FreeableBuffer with an int64_t, instead of a void*

## Usecase
PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.

We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.

Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.

## Potential concerns
* Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small.
* Increased runtime latency; calls to the existing void* API perform truncation checks now.

## Alternatives
Why we choose this solution? Seems to be the least intrusive out of a number of solutions.
1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However  void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist.
2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating.
3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from.
4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface.

https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0

Differential Revision: D83007972
@facebook-github-bot
Copy link
Contributor

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83007972.

Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to ping me if I don't re-review within a day next time

Summary:

This diff backs FreeableBuffer with an int64_t, instead of a void*

## Usecase
PTE file lives on a 32-bit system, where void* is 4 bytes.
PTD file lives on a 36-bit system, which requires int64_t to address it.

We want to fetch addresses from the ptd file and pass them to accelerator. Executorch APIs return a FreeableBuffer when fetching data (data_loader.h, named_data_map.h). FreeableBuffer is currently backed by a void*, which could truncate a 36-bit address, when on a 32-bit system.

Note that we still want existing void* behavior to load segments etc., and only want int64_t behavior when fetching weights from the named_data_map.

## Potential concerns
* Increased memory usage; additional 4 bytes for each FreeableBuffer + some extra for the std::variant template. For the PTE file, this is order of the number of segments, which is usually small.
* Increased runtime latency; calls to the existing void* API perform truncation checks now.

## Alternatives
Why we choose this solution? Seems to be the least intrusive out of a number of solutions.
1. Compiler macro to switch the backing value of FreeableBuffer to int64_t for specific builds. However  void* and int64_ are both required - void* for the existing use cases (fetching delegate blobs). Both APIs must exist.
2. Template FreeableBuffer on void* and int64_t. Messy, as the template is contagious; it will need to be applied to data_loader.h, named_data_map.h, and potentially program/method. Imagine this will bloat the core runtime with templating.
3. Store int64_t addresses in FreeableBuffer, and ask user to parse the address to load the data. This avoids changing FreeableBuffer, but is not semantically correct as the API is not returning a buffer of data, but an address that the user must parse and then load data from.
4. Add a specific API to named_data_map.h that returns an int64_t buffer. Not a good use of API surface.

https://docs.google.com/document/d/11dMXh1N66rfY-8aO3N-ra2dDPx0RGCoc1rlCSN80Zf8/edit?tab=t.0

Reviewed By: swolchok

Differential Revision: D83007972
Copy link

meta-codesync bot commented Oct 7, 2025

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83007972.

@meta-codesync meta-codesync bot merged commit f32e9fc into pytorch:main Oct 8, 2025
290 of 307 checks passed
@zingo
Copy link
Collaborator

zingo commented Oct 9, 2025

This seem to have broken the coretex-m size test :(

@lucylq lucylq mentioned this pull request Oct 9, 2025
lucylq added a commit that referenced this pull request Oct 9, 2025
regressed 70 bytes after
#14570
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants