-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Hi,
Thank you for sharing this project! I have a few questions about the implementation details and would appreciate your clarification:
-
In the code, I noticed that
get_enhance_weight()seems to act like a “temperature” parameter mentioned in the paper, but I couldn’t find any reference in the paper to multiplyingmean_scores.mean()bynum_frames.enhance_scores = mean_scores.mean() * (num_frames + get_enhance_weight())
- Why is
mean_scores.mean()used here? What exactly is being averaged, and what does it represent? - How does this averaging influence the final enhancement process?
- What is the rationale behind multiplying by
num_frames, and how does this relate to the method described in the paper?
- Why is
-
I also noticed that
enhance_scoresis multiplied by the attention output, and it is mentioned that this improves temporal consistency. Could you explain the theoretical or intuitive reasoning behind this? Specifically, why does scaling the attention output withenhance_scorescontribute to better temporal consistency?
Any insights you can share would be greatly appreciated. Thank you in advance for your time!