Improve information about group offloading and layerwise casting #11101

a-r-r-o-w · 2025-03-18T06:42:32Z

No description provided.

docs/source/en/optimization/memory.md

HuggingFaceDocBuilderDev · 2025-03-18T06:52:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/source/en/optimization/memory.md

DN6 · 2025-03-18T07:19:11Z

docs/source/en/optimization/memory.md


+<Tip>
+
+- Layerwise casting may not work with all models out-of-the-box. Sometimes, the forward implementations of the model contain weight-dependent typecasting of inputs. Such implementations are not supported due to the currently simplistic implementation of layerwise casting, which assumes that the forward pass is independent of the weight precision and that the input dtypes are always in `compute_dtype`. An example of an incompatible implementation can be found [here](https://github.com/huggingface/transformers/blob/7f5077e53682ca855afc826162b204ebf809f1f9/src/transformers/models/t5/modeling_t5.py#L294-L299).


Should also mention that it can be disabled on modules with the skip patterns.

DN6 · 2025-03-18T07:19:19Z

docs/source/en/optimization/memory.md


+<Tip>
+
+- Group offloading may not work with all models out-of-the-box. If the forward implementations of the model contain weight-dependent device-casting of inputs, it may clash with the offloading mechanism's handling of device-casting.


Should also mention that it can be disabled on modules with the skip patterns.

We don't support skipping in group offloading. Will mention for layerwise casting though

Co-authored-by: Dhruv Nair <[email protected]>

a-r-r-o-w · 2025-03-24T17:55:53Z

Failing test looks unrelated

update

97d8399

a-r-r-o-w requested a review from DN6 March 18, 2025 06:42

a-r-r-o-w commented Mar 18, 2025

View reviewed changes

docs/source/en/optimization/memory.md Outdated Show resolved Hide resolved

Update docs/source/en/optimization/memory.md

a09ddbd

DN6 reviewed Mar 18, 2025

View reviewed changes

docs/source/en/optimization/memory.md Outdated Show resolved Hide resolved

DN6 reviewed Mar 18, 2025

View reviewed changes

docs/source/en/optimization/memory.md Outdated Show resolved Hide resolved

DN6 reviewed Mar 18, 2025

View reviewed changes

a-r-r-o-w and others added 2 commits March 18, 2025 14:33

Apply suggestions from code review

6e9b6c7

Co-authored-by: Dhruv Nair <[email protected]>

apply review suggestions

8850e70

a-r-r-o-w requested a review from DN6 March 18, 2025 09:11

DN6 and others added 2 commits March 24, 2025 22:21

Merge branch 'main' into improve-info-layerwise-and-group

e487c1f

update

6133622

DN6 approved these changes Mar 24, 2025

View reviewed changes

a-r-r-o-w merged commit 1ddf3f3 into main Mar 24, 2025
14 of 15 checks passed

a-r-r-o-w deleted the improve-info-layerwise-and-group branch March 24, 2025 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve information about group offloading and layerwise casting #11101

Improve information about group offloading and layerwise casting #11101

Uh oh!

a-r-r-o-w commented Mar 18, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

DN6 Mar 18, 2025

Uh oh!

DN6 Mar 18, 2025

Uh oh!

a-r-r-o-w Mar 18, 2025

Uh oh!

a-r-r-o-w commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		<Tip>

		- Layerwise casting may not work with all models out-of-the-box. Sometimes, the forward implementations of the model contain weight-dependent typecasting of inputs. Such implementations are not supported due to the currently simplistic implementation of layerwise casting, which assumes that the forward pass is independent of the weight precision and that the input dtypes are always in `compute_dtype`. An example of an incompatible implementation can be found [here](https://github.com/huggingface/transformers/blob/7f5077e53682ca855afc826162b204ebf809f1f9/src/transformers/models/t5/modeling_t5.py#L294-L299).


		<Tip>

		- Group offloading may not work with all models out-of-the-box. If the forward implementations of the model contain weight-dependent device-casting of inputs, it may clash with the offloading mechanism's handling of device-casting.

Uh oh!

Improve information about group offloading and layerwise casting #11101

Improve information about group offloading and layerwise casting #11101

Uh oh!

Conversation

a-r-r-o-w commented Mar 18, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

DN6 Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants