Question regarding the aggregation strategy in _get_attn_k()

This is a very insightful piece of work! It's impressive to see robust motion segmentation and quality enhancement for dynamic point clouds achieved in a training-free manner.

I have a question regarding the implementation details. I noticed that when aggregating the cross-attention maps for a single image (referring to the _get_attn_k() function in model.py), the aggregation is performed along the query dimension rather than the key dimension.

Is there a specific reason or intuition behind this design choice? I would really appreciate it if you could share your insights. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question regarding the aggregation strategy in _get_attn_k() #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question regarding the aggregation strategy in _get_attn_k() #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions