Skip to content

Conversation

@SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Oct 17, 2024

Stack from ghstack (oldest at bottom):

Context

As title, revamp the prepacking API:

  • Make the naming more clear; i.e. prepack_if_tensor_ref to prepack_standard to disambiguate the packing logic that will used.
  • Instead of passing through the v argument if it is a Tensor by default, this functionality must be toggled via the passthrough argument. The goal is to encourage developers to be more explicit about what types they expect the operator arguments to be.
  • Consolidate API surface and reduce the number of overloads

Past the API changes, I have also removed a bunch of unnecessary calls to prepack_if_tensor_ref throughout the operator implementations. The most common cases were calling it on an input tensor which is not necessary.

The "big picture" for prepacking

TensorRef objects and prepacking are used whenever we are dealing with a Tensor whose data is serialized with the model. However, these "serialized tensors" all belong to one of two categories

  • Weight/biases: trained weights and biases that act as the state for a i.e. Convolutional or Linear layer. These tensors are used only within the nn.Module that they belong to
  • Persistent tensors: tensors whose data just happen to be invariant to the inputs, and their data can be serialized with the model itself. They are treated as regular tensors and may be used in several operators throughout the model. One example is freqs_sin and freqs_cos in Llama models which are used to calculate rotary positional encodings

For weights and biases, the way that the serialized data should be packed may be dependent on the operator it is used in. However, for persistent tensors they must be packed with the "standard" staging to tensor algorithm since they are the same as regular tensors.

While it is well known which operators expect weight tensors. However, persistent tensors are tricky because they can be used as an argument to any operator. This would mean that every operator needs to account for the possibility that one of their inputs will be a serialized tensor. This is undesirable because it adds an additional layer of indirection when processing operator inputs on top of the fact that every argument is actually a reference to aValue object in the graph, which itself is a wrapper. It also makes things complicated for the operator developer. Another downside is that persistent tensors will be packed multiple times, once by each operator that uses it.

To address this, I plan to handle persistent tensors at export time by inserting a prepack() operator for them which will cause operators that use the serialized tensor to see a Tensor object instead of a TensorRef object. This will make it so that the only operators that should expect to prepack an argument are tensors that expect a weight argument, and also avoid packing persistent tensors multiple times.

Differential Revision: D64550560

## Context

As title, revamp the prepacking API:

* Make the naming more clear; i.e. `prepack_if_tensor_ref` to `prepack_standard` to disambiguate the packing logic that will used.
* Instead of passing through the `v` argument if it is a `Tensor` by default, this functionality must be toggled via the `passthrough` argument. The goal is to  encourage developers to be more explicit about what types they expect the operator arguments to be.
* Consolidate API surface and reduce the number of overloads

Past the API changes, I have also removed a bunch of unnecessary calls to `prepack_if_tensor_ref` throughout the operator implementations. The most common cases were calling it on an input tensor which is not necessary.

## The "big picture" for prepacking

`TensorRef` objects and prepacking are used whenever we are dealing with a Tensor whose data is serialized with the model. However, these "serialized tensors" all belong to one of two categories

* Weight/biases: trained weights and biases that act as the state for a i.e. Convolutional or Linear layer. These tensors are used only within the `nn.Module` that they belong to
* Persistent tensors: tensors whose data just happen to be invariant to the inputs, and their data can be serialized with the model itself. They are treated as regular tensors and may be used in several operators throughout the model. One example is `freqs_sin` and `freqs_cos` in Llama models which are used to calculate rotary positional encodings

For weights and biases, the way that the serialized data should be packed may be dependent on the operator it is used in. However, for persistent tensors they must be packed with the "standard" staging to tensor algorithm since they are the same as regular tensors.

While it is well known which operators expect weight tensors. However, persistent tensors are tricky because they can be used as an argument to any operator. This would mean that every operator needs to account for the possibility that one of their inputs will be a serialized tensor. This is undesirable because it adds an additional layer of indirection when processing operator inputs on top of the fact that every argument is actually a reference to a`Value` object in the graph, which itself is a wrapper. It also makes things complicated for the operator developer. Another downside is that persistent tensors will be packed multiple times, once by each operator that uses it.

To address this, I plan to handle persistent tensors at export time by inserting a `prepack()` operator for them which will cause operators that use the serialized tensor to see a Tensor object instead of a TensorRef object. This will make it so that the only operators that should expect to prepack an argument are tensors that expect a weight argument, and also avoid packing persistent tensors multiple times.

Differential Revision: [D64550560](https://our.internmc.facebook.com/intern/diff/D64550560/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6331

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c7ce9f0 with merge base 7493aae (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 17, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64550560

SS-JIA added a commit that referenced this pull request Oct 17, 2024
## Context

As title, revamp the prepacking API:

* Make the naming more clear; i.e. `prepack_if_tensor_ref` to `prepack_standard` to disambiguate the packing logic that will used.
* Instead of passing through the `v` argument if it is a `Tensor` by default, this functionality must be toggled via the `passthrough` argument. The goal is to  encourage developers to be more explicit about what types they expect the operator arguments to be.
* Consolidate API surface and reduce the number of overloads

Past the API changes, I have also removed a bunch of unnecessary calls to `prepack_if_tensor_ref` throughout the operator implementations. The most common cases were calling it on an input tensor which is not necessary.

## The "big picture" for prepacking

`TensorRef` objects and prepacking are used whenever we are dealing with a Tensor whose data is serialized with the model. However, these "serialized tensors" all belong to one of two categories

* Weight/biases: trained weights and biases that act as the state for a i.e. Convolutional or Linear layer. These tensors are used only within the `nn.Module` that they belong to
* Persistent tensors: tensors whose data just happen to be invariant to the inputs, and their data can be serialized with the model itself. They are treated as regular tensors and may be used in several operators throughout the model. One example is `freqs_sin` and `freqs_cos` in Llama models which are used to calculate rotary positional encodings

For weights and biases, the way that the serialized data should be packed may be dependent on the operator it is used in. However, for persistent tensors they must be packed with the "standard" staging to tensor algorithm since they are the same as regular tensors.

While it is well known which operators expect weight tensors. However, persistent tensors are tricky because they can be used as an argument to any operator. This would mean that every operator needs to account for the possibility that one of their inputs will be a serialized tensor. This is undesirable because it adds an additional layer of indirection when processing operator inputs on top of the fact that every argument is actually a reference to a`Value` object in the graph, which itself is a wrapper. It also makes things complicated for the operator developer. Another downside is that persistent tensors will be packed multiple times, once by each operator that uses it.

To address this, I plan to handle persistent tensors at export time by inserting a `prepack()` operator for them which will cause operators that use the serialized tensor to see a Tensor object instead of a TensorRef object. This will make it so that the only operators that should expect to prepack an argument are tensors that expect a weight argument, and also avoid packing persistent tensors multiple times.

Differential Revision: [D64550560](https://our.internmc.facebook.com/intern/diff/D64550560/)

ghstack-source-id: 248596130
Pull Request resolved: #6331
## Context

As title, revamp the prepacking API:

* Make the naming more clear; i.e. `prepack_if_tensor_ref` to `prepack_standard` to disambiguate the packing logic that will used.
* Instead of passing through the `v` argument if it is a `Tensor` by default, this functionality must be toggled via the `passthrough` argument. The goal is to  encourage developers to be more explicit about what types they expect the operator arguments to be.
* Consolidate API surface and reduce the number of overloads

Past the API changes, I have also removed a bunch of unnecessary calls to `prepack_if_tensor_ref` throughout the operator implementations. The most common cases were calling it on an input tensor which is not necessary.

## The "big picture" for prepacking

`TensorRef` objects and prepacking are used whenever we are dealing with a Tensor whose data is serialized with the model. However, these "serialized tensors" all belong to one of two categories

* Weight/biases: trained weights and biases that act as the state for a i.e. Convolutional or Linear layer. These tensors are used only within the `nn.Module` that they belong to
* Persistent tensors: tensors whose data just happen to be invariant to the inputs, and their data can be serialized with the model itself. They are treated as regular tensors and may be used in several operators throughout the model. One example is `freqs_sin` and `freqs_cos` in Llama models which are used to calculate rotary positional encodings

For weights and biases, the way that the serialized data should be packed may be dependent on the operator it is used in. However, for persistent tensors they must be packed with the "standard" staging to tensor algorithm since they are the same as regular tensors.

While it is well known which operators expect weight tensors. However, persistent tensors are tricky because they can be used as an argument to any operator. This would mean that every operator needs to account for the possibility that one of their inputs will be a serialized tensor. This is undesirable because it adds an additional layer of indirection when processing operator inputs on top of the fact that every argument is actually a reference to a`Value` object in the graph, which itself is a wrapper. It also makes things complicated for the operator developer. Another downside is that persistent tensors will be packed multiple times, once by each operator that uses it.

To address this, I plan to handle persistent tensors at export time by inserting a `prepack()` operator for them which will cause operators that use the serialized tensor to see a Tensor object instead of a TensorRef object. This will make it so that the only operators that should expect to prepack an argument are tensors that expect a weight argument, and also avoid packing persistent tensors multiple times.

Differential Revision: [D64550560](https://our.internmc.facebook.com/intern/diff/D64550560/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64550560

## Context

As title, revamp the prepacking API:

* Make the naming more clear; i.e. `prepack_if_tensor_ref` to `prepack_standard` to disambiguate the packing logic that will used.
* Instead of passing through the `v` argument if it is a `Tensor` by default, this functionality must be toggled via the `passthrough` argument. The goal is to  encourage developers to be more explicit about what types they expect the operator arguments to be.
* Consolidate API surface and reduce the number of overloads

Past the API changes, I have also removed a bunch of unnecessary calls to `prepack_if_tensor_ref` throughout the operator implementations. The most common cases were calling it on an input tensor which is not necessary.

## The "big picture" for prepacking

`TensorRef` objects and prepacking are used whenever we are dealing with a Tensor whose data is serialized with the model. However, these "serialized tensors" all belong to one of two categories

* Weight/biases: trained weights and biases that act as the state for a i.e. Convolutional or Linear layer. These tensors are used only within the `nn.Module` that they belong to
* Persistent tensors: tensors whose data just happen to be invariant to the inputs, and their data can be serialized with the model itself. They are treated as regular tensors and may be used in several operators throughout the model. One example is `freqs_sin` and `freqs_cos` in Llama models which are used to calculate rotary positional encodings

For weights and biases, the way that the serialized data should be packed may be dependent on the operator it is used in. However, for persistent tensors they must be packed with the "standard" staging to tensor algorithm since they are the same as regular tensors.

While it is well known which operators expect weight tensors. However, persistent tensors are tricky because they can be used as an argument to any operator. This would mean that every operator needs to account for the possibility that one of their inputs will be a serialized tensor. This is undesirable because it adds an additional layer of indirection when processing operator inputs on top of the fact that every argument is actually a reference to a`Value` object in the graph, which itself is a wrapper. It also makes things complicated for the operator developer. Another downside is that persistent tensors will be packed multiple times, once by each operator that uses it.

To address this, I plan to handle persistent tensors at export time by inserting a `prepack()` operator for them which will cause operators that use the serialized tensor to see a Tensor object instead of a TensorRef object. This will make it so that the only operators that should expect to prepack an argument are tensors that expect a weight argument, and also avoid packing persistent tensors multiple times.

Differential Revision: [D64550560](https://our.internmc.facebook.com/intern/diff/D64550560/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64550560

## Context

As title, revamp the prepacking API:

* Make the naming more clear; i.e. `prepack_if_tensor_ref` to `prepack_standard` to disambiguate the packing logic that will used.
* Instead of passing through the `v` argument if it is a `Tensor` by default, this functionality must be toggled via the `passthrough` argument. The goal is to  encourage developers to be more explicit about what types they expect the operator arguments to be.
* Consolidate API surface and reduce the number of overloads

Past the API changes, I have also removed a bunch of unnecessary calls to `prepack_if_tensor_ref` throughout the operator implementations. The most common cases were calling it on an input tensor which is not necessary.

## The "big picture" for prepacking

`TensorRef` objects and prepacking are used whenever we are dealing with a Tensor whose data is serialized with the model. However, these "serialized tensors" all belong to one of two categories

* Weight/biases: trained weights and biases that act as the state for a i.e. Convolutional or Linear layer. These tensors are used only within the `nn.Module` that they belong to
* Persistent tensors: tensors whose data just happen to be invariant to the inputs, and their data can be serialized with the model itself. They are treated as regular tensors and may be used in several operators throughout the model. One example is `freqs_sin` and `freqs_cos` in Llama models which are used to calculate rotary positional encodings

For weights and biases, the way that the serialized data should be packed may be dependent on the operator it is used in. However, for persistent tensors they must be packed with the "standard" staging to tensor algorithm since they are the same as regular tensors.

While it is well known which operators expect weight tensors. However, persistent tensors are tricky because they can be used as an argument to any operator. This would mean that every operator needs to account for the possibility that one of their inputs will be a serialized tensor. This is undesirable because it adds an additional layer of indirection when processing operator inputs on top of the fact that every argument is actually a reference to a`Value` object in the graph, which itself is a wrapper. It also makes things complicated for the operator developer. Another downside is that persistent tensors will be packed multiple times, once by each operator that uses it.

To address this, I plan to handle persistent tensors at export time by inserting a `prepack()` operator for them which will cause operators that use the serialized tensor to see a Tensor object instead of a TensorRef object. This will make it so that the only operators that should expect to prepack an argument are tensors that expect a weight argument, and also avoid packing persistent tensors multiple times.

Differential Revision: [D64550560](https://our.internmc.facebook.com/intern/diff/D64550560/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64550560

## Context

As title, revamp the prepacking API:

* Make the naming more clear; i.e. `prepack_if_tensor_ref` to `prepack_standard` to disambiguate the packing logic that will used.
* Instead of passing through the `v` argument if it is a `Tensor` by default, this functionality must be toggled via the `passthrough` argument. The goal is to  encourage developers to be more explicit about what types they expect the operator arguments to be.
* Consolidate API surface and reduce the number of overloads

Past the API changes, I have also removed a bunch of unnecessary calls to `prepack_if_tensor_ref` throughout the operator implementations. The most common cases were calling it on an input tensor which is not necessary.

## The "big picture" for prepacking

`TensorRef` objects and prepacking are used whenever we are dealing with a Tensor whose data is serialized with the model. However, these "serialized tensors" all belong to one of two categories

* Weight/biases: trained weights and biases that act as the state for a i.e. Convolutional or Linear layer. These tensors are used only within the `nn.Module` that they belong to
* Persistent tensors: tensors whose data just happen to be invariant to the inputs, and their data can be serialized with the model itself. They are treated as regular tensors and may be used in several operators throughout the model. One example is `freqs_sin` and `freqs_cos` in Llama models which are used to calculate rotary positional encodings

For weights and biases, the way that the serialized data should be packed may be dependent on the operator it is used in. However, for persistent tensors they must be packed with the "standard" staging to tensor algorithm since they are the same as regular tensors.

While it is well known which operators expect weight tensors. However, persistent tensors are tricky because they can be used as an argument to any operator. This would mean that every operator needs to account for the possibility that one of their inputs will be a serialized tensor. This is undesirable because it adds an additional layer of indirection when processing operator inputs on top of the fact that every argument is actually a reference to a`Value` object in the graph, which itself is a wrapper. It also makes things complicated for the operator developer. Another downside is that persistent tensors will be packed multiple times, once by each operator that uses it.

To address this, I plan to handle persistent tensors at export time by inserting a `prepack()` operator for them which will cause operators that use the serialized tensor to see a Tensor object instead of a TensorRef object. This will make it so that the only operators that should expect to prepack an argument are tensors that expect a weight argument, and also avoid packing persistent tensors multiple times.

Differential Revision: [D64550560](https://our.internmc.facebook.com/intern/diff/D64550560/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64550560

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 75c17c0.

@SS-JIA SS-JIA deleted the gh/SS-JIA/117/head branch January 24, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants