-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Hi Authors,
Thank you very much for your outstanding work. Recently, I came across your research results and was immediately intrigued. I attempted to train a parameter autoencoder using my own set of parameters, employing the OneDimCNN from model/pdiff. I constructed a small CNN with 6,656 parameters and, by initializing with different random seeds, collected 200 groups of trained model parameters, each achieving approximately 89% accuracy on the test set. During training, however, I encountered some issues.
Specifically, I observed that the autoencoder failed to converge. I applied normalization and added noise to the raw data, yet the MSE decreased very slowly during training. After experimenting with various learning rates and training cycles, I eventually settled on a learning rate of 2e-3. Even after training for 25,000 steps, the MSE stabilized at around 0.019, which did not seem ideal to me. I noted that the raw model weights have magnitudes around xe-2 or xe-3.
Furthermore, I used the trained VAE model to randomly load a CNN model parameter from the saved dataset. After encoding and decoding through the VAE, I evaluated the generated parameters on the test set and found that the accuracy plummeted to approximately 10%, a stark contrast to the original roughly 89% accuracy.Additionally, the loss value of the generated model parameters on the test set has become significantly large.
I experimented with different numbers of steps, learning rates, and modified the convolution kernel sizes, channel counts, and the model_dim in the autoencoder. Despite testing various combinations, I was unable to achieve satisfactory results, and the generated parameters consistently yielded an evaluation accuracy of around 10%.
I would like to ask whether you have encountered similar situations in your experiments, and if so, could you recommend any potential solutions? Additionally, I am curious if generating certain parameters of the model, such as those in the fully connected layers, might be easier than generating all parameters of the model.
I sincerely appreciate any advice or insights you could share. Thank you for your time and consideration.
Best regards