Skip to content

Conversation

MengAiDev
Copy link
Contributor

  • Update the _initialize_missing_keys method to handle both missing and mismatched keys
  • Ensure proper initialization of weights when loading pretrained models

Fixes #40001

@ArthurZucker

… keys

- Update the _initialize_missing_keys method to handle both missing and mismatched keys
- Ensure proper initialization of weights when loading pretrained models
@MengAiDev MengAiDev mentioned this pull request Aug 12, 2025
4 tasks
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep a good fix but let's isolate

Comment on lines 3730 to 3740
# Handle potential NaN values in accumulated_log_probs
probs = nn.functional.softmax(accumulated_log_probs, dim=-1)
# Replace NaN values with uniform distribution
if torch.isnan(probs).any():
# Create a mask for NaN positions
nan_mask = torch.isnan(probs)
# Replace NaN with a small uniform probability
probs = torch.where(nan_mask, torch.ones_like(probs) / probs.shape[-1], probs)
# Renormalize to ensure probabilities sum to 1
probs = probs / probs.sum(dim=-1, keepdim=True)
topk_indices = torch.multinomial(probs, num_samples=beams_to_keep)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, but it needs a separate issue!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it, but I do this because of seeing the circle ci failed and I try to solve it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get you, the test failing is probably unrelated to your changes no?

@MengAiDev MengAiDev requested a review from ArthurZucker August 13, 2025 02:46
@ArthurZucker
Copy link
Collaborator

cc @Cyrilvallez as I don't have time to check!

@Cyrilvallez
Copy link
Member

Hey! The argument given is all keys present in the checkpoints - we then take the set inverse of that to init the other keys. So everything is initialized correctly 🙂🤗 So I believe this PR is wrong! Let me know if you have any repro that could make you think otherwise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible wrong init call
3 participants