about the way to calculate attention weight

It seems that the way to calculate attention weight is different from origin paper:  softmax(v* tanh(W*[s,h])),       relu are used after softmax here, can you give some reasons or reference?

 `    def forward(self, hidden, encoder_outputs):
        timestep = encoder_outputs.size(0)
        h = hidden.repeat(timestep, 1, 1).transpose(0, 1)
        encoder_outputs = encoder_outputs.transpose(0, 1)  # [B*T*H]
        attn_energies = self.score(h, encoder_outputs)
        return F.relu(attn_energies).unsqueeze(1)

    def score(self, hidden, encoder_outputs):
        # [B*T*2H]->[B*T*H]
        energy = F.softmax(self.attn(torch.cat([hidden, encoder_outputs], 2)), dim=2)
        energy = energy.transpose(1, 2)  # [B*H*T]
        v = self.v.repeat(encoder_outputs.size(0), 1).unsqueeze(1)  # [B*1*H]
        energy = torch.bmm(v, energy)  # [B*1*T]
        return energy.squeeze(1)  # [B*T]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about the way to calculate attention weight #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

about the way to calculate attention weight #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions