Bug about weight sharing in AutoFormer

https://github.com/microsoft/Cream/blob/4a13c4091e78f9abd2160e7e01c02e48c1cf8fb9/AutoFormer/model/module/qkv_super.py#L72-L77
I think, there's something wrong in the way weight sharing is done here. I think this code should be:
```python
    N = weight.size(0) // 3
    sample_weight = torch.cat([sample_weight[i*N:i*N+sample_out_dim//3, :] for i in range(3)], dim=0)
```

To be more intuitive, I drew a schematic diagram to represent the way 4 and 5 heads SA is shared with Linear.weight.

![Snipaste_2024-03-28_22-05-19](https://github.com/microsoft/Cream/assets/26325745/69d111a3-c799-4499-b598-5239aeb5d18d)

Maybe I misunderstood the implementation here, can you help check it?

	def sample_weight(weight, sample_in_dim, sample_out_dim):

	sample_weight = weight[:, :sample_in_dim]
	sample_weight = torch.cat([sample_weight[i:sample_out_dim:3, :] for i in range(3)], dim =0)

	return sample_weight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug about weight sharing in AutoFormer #232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug about weight sharing in AutoFormer #232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions