Skip to content

Large gap in reproducing ListOps results with S4-model #166

@QuietNoviceCoder

Description

@QuietNoviceCoder

Hi, thanks for releasing the code and paper!

I am a newbie.I carefully read your paper and experiments, and tried to follow the reported hyperparameters. I am currently reproducing the S4 results on the LRA ListOps task. I implemented the S4 network myself, where each S4-block consists of:

S4 layer → activation → linear layer → activation

I used 6 such blocks, followed by average pooling and then a classification head. However, after training, the training set accuracy reaches about 59%, while the test set accuracy is only 51%, showing obvious overfitting. Additionally, when the dropout rate is adjusted to 0.2-0.3, the loss function tends to explode (become NaN).It also seems that using tanh as the activation function performs slightly better.

Could you please share more details about the experimental setup that are important for reproducing the reported results?

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions