Skip to content

How to implement RegMean for GPT-like model?  #4

@A11en0

Description

@A11en0

Is your feature request related to a problem? Please describe.
GPT implementation by hugging face is different from T5 and Roberta due to it implementing a self-attention calculator in a parallel way like below:

def __init__():
        self.c_attn = Conv1D(n_state * 3, nx)
        ...

def forward():
        x = self.c_attn(x)
        query, key, value = x.split(self.split_size, dim=2)
        query = self.split_heads(query)
        key = self.split_heads(key, k=True)
        value = self.split_heads(value)
        ...

In this implementation, the RegMean suffers from an issue in the regmean_merge() function, i.e. the line 163 gram_m_ws.append(torch.matmul(param_grams, param transpose(0,1))), the matrix dimensional is not matched. param_grams is [1024, 1024], param is [1024, 1024*3].

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions