Recurrent Network

class MambaModel(nn.Module):
    config = {}

    def __init__(self, positional_embedding):
        super().__init__()
        mamba_config = {
            "d_model": self.config["d_model"],
            "d_state": self.config["d_state"],
            "d_conv": self.config["d_conv"],
            "expand": self.config["expand"],
        }
        self.mamba_forward = nn.Sequential(*[Mamba(**mamba_config) for _ in range(self.config["num_layers"])])
        pe = positional_embedding[None, :, :]
        if self.config.get("trainable_pe"):
            self.pe = nn.Parameter(pe)
        else:  # fixed positional embedding
            self.register_buffer("pe", pe)

    def forward(self, output_shape, condition=None):
        assert len(condition.shape) == 3
        x = self.mamba_forward(self.pe.repeat(output_shape[0], 1, 1) + condition)
        return x


I noticed that in the actual recurrent network, the input of Mamba only includes positional encoding and condition input. However, the description in the paper states: "After obtaining the parameter tokens K[i], permutation states S, and position embeddings e[i], we feed them into a recurrent network f(·) that learns tokenwise representations while capturing cross-token dependencies." This seems to be inconsistent with the actual input to the recurrent network.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recurrent Network #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recurrent Network #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions