How to implement RegMean for GPT-like model? 

**Is your feature request related to a problem? Please describe.**
GPT implementation by hugging face is different from T5 and Roberta due to it implementing a self-attention calculator in a parallel way like below:

```python
def __init__():
        self.c_attn = Conv1D(n_state * 3, nx)
        ...

def forward():
        x = self.c_attn(x)
        query, key, value = x.split(self.split_size, dim=2)
        query = self.split_heads(query)
        key = self.split_heads(key, k=True)
        value = self.split_heads(value)
        ...
```

In this implementation, the RegMean suffers from an issue in the regmean_merge() function, i.e. the line 163 `gram_m_ws.append(torch.matmul(param_grams, param transpose(0,1)))`,  the matrix dimensional is not matched. param_grams is [1024, 1024], param is [1024, 1024*3].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to implement RegMean for GPT-like model? #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to implement RegMean for GPT-like model? #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions