Skip to content

The autoencoder performs very poorly #30

@nononeundertsand

Description

@nononeundertsand

Hi Authors,

Thank you very much for your outstanding work. Recently, I came across your research results and was immediately intrigued. I attempted to train a parameter autoencoder using my own set of parameters, employing the OneDimCNN from model/pdiff. I constructed a small CNN with 6,656 parameters and, by initializing with different random seeds, collected 200 groups of trained model parameters, each achieving approximately 89% accuracy on the test set. During training, however, I encountered some issues.

Specifically, I observed that the autoencoder failed to converge. I applied normalization and added noise to the raw data, yet the MSE decreased very slowly during training. After experimenting with various learning rates and training cycles, I eventually settled on a learning rate of 2e-3. Even after training for 25,000 steps, the MSE stabilized at around 0.019, which did not seem ideal to me. I noted that the raw model weights have magnitudes around xe-2 or xe-3.

Furthermore, I used the trained VAE model to randomly load a CNN model parameter from the saved dataset. After encoding and decoding through the VAE, I evaluated the generated parameters on the test set and found that the accuracy plummeted to approximately 10%, a stark contrast to the original roughly 89% accuracy.Additionally, the loss value of the generated model parameters on the test set has become significantly large.

I experimented with different numbers of steps, learning rates, and modified the convolution kernel sizes, channel counts, and the model_dim in the autoencoder. Despite testing various combinations, I was unable to achieve satisfactory results, and the generated parameters consistently yielded an evaluation accuracy of around 10%.

I would like to ask whether you have encountered similar situations in your experiments, and if so, could you recommend any potential solutions? Additionally, I am curious if generating certain parameters of the model, such as those in the fully connected layers, might be easier than generating all parameters of the model.

I sincerely appreciate any advice or insights you could share. Thank you for your time and consideration.

Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions