-
Notifications
You must be signed in to change notification settings - Fork 332
[wwb] Add video generation to wwb #3134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8edff7c to
e5004d3
Compare
d358772 to
cdc17ad
Compare
5ebed4d to
ad74cce
Compare
971aecf to
610e542
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.metrics = metrics | ||
| self.crop_prompt = crop_prompts | ||
| self.num_samples = num_samples | ||
| self.num_inference_steps = num_inference_steps or self.DEF_NUM_INF_STEPS |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant DEF_NUM_INF_STEPS is set to 25 on line 20, but DEF_NUM_INFERENCE_STEP in text2image_evaluator.py is set to 4. This inconsistency may confuse users about appropriate defaults for different generation types. Consider documenting why video generation requires more inference steps than image generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # according to post processing of outputs in get_video_features of LlavaNextVideoModel | ||
| # https://github.com/huggingface/transformers/blob/v4.53.2/src/transformers/models/llava_next_video/modular_llava_next_video.py#L387 | ||
| outputs = outputs.hidden_states[layer_idx][:, 1:] | ||
| return outputs.mean(dim=2) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dimension parameter in mean(dim=2) should use a named constant or include a comment explaining which dimension is being averaged (e.g., hidden_dim). This improves code readability and maintainability.
Wovchena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3134 (review) is still applicable
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
af7838e
Description
How to run:
CVS-176896
Fixes #(issue)
Checklist: