Skip to content

Conversation

@yao-matrix
Copy link
Contributor

similar like this PR #39646, fix the same issue found while enabling llama lora finetuning across multiple card.

@SunMarc , pls help review, thx very much.

Signed-off-by: Yao, Matrix <[email protected]>
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@SunMarc SunMarc enabled auto-merge (squash) August 6, 2025 13:32
@SunMarc
Copy link
Member

SunMarc commented Aug 6, 2025

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2025

Style fix runs successfully without any file modified.

@SunMarc
Copy link
Member

SunMarc commented Aug 6, 2025

You need to propagate the changes to other repos also using make fix-copies. Btw since we are touching llama model, can you share more details to the issue you had ?

@yao-matrix
Copy link
Contributor Author

yao-matrix commented Aug 6, 2025

You need to propagate the changes to other repos also using make fix-copies. Btw since we are touching llama model, can you share more details to the issue you had ?

@SunMarc when I enable this multi-adapter lora inference in peft huggingface/peft#2711 on a 2-card env, the issue raised, it's because this line https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L66 is not a Module, so not hooked, same issue as before we discussed.

Signed-off-by: Yao, Matrix <[email protected]>
auto-merge was automatically disabled August 6, 2025 15:49

Head branch was pushed to by a user without write access

@SunMarc
Copy link
Member

SunMarc commented Aug 7, 2025

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L66 is not a Module, so not hooked, same issue as before we discussed.

it is a nn.Module no ? class LlamaRMSNorm(nn.Module):. Can you share the traceback ?

@yao-matrix
Copy link
Contributor Author

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L66 is not a Module, so not hooked, same issue as before we discussed.

it is a nn.Module no ? class LlamaRMSNorm(nn.Module):. Can you share the traceback ?

I mean return self.weight * hidden_states.to(input_dtype) here * is not a nn.Module, so cannot hooked to place to same device automatically by accelerate, I'm trying to reproduce and get the log

@yao-matrix
Copy link
Contributor Author

seems i lost the env to reproduce it, let's put it on hold here, and once i reproduced it, i'll put the log back

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: arcee, aria, bitnet, cohere, cohere2, deepseek_v2, deepseek_v3, diffllama, doge, dots1, ernie4_5, gemma, gemma2, glm, glm4, glm4_moe

@yao-matrix yao-matrix closed this Oct 23, 2025
@yao-matrix yao-matrix deleted the llama-fix branch October 29, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants