-
Notifications
You must be signed in to change notification settings - Fork 6
160 try nlp optima from 2025 03 30 study with adamw #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
david-thrower
merged 38 commits into
main
from
160-try-nlp-optima-from-2025-03-30-study-with-adamw
Apr 12, 2025
Merged
160 try nlp optima from 2025 03 30 study with adamw #161
david-thrower
merged 38 commits into
main
from
160-try-nlp-optima-from-2025-03-30-study-with-adamw
Apr 12, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Comment temporarily disable time-consuming workflows. Comment out BERT based text classification workflow possibly permanently, as this is obsolete.
Add branch to workflow.
Added a baseline fine tuning of the full GPT2 to compare against Cerebros text classifier.
Forgot to add dropout.
Amendments to Cerebros model.
Reduce seq length to accelerate job completion.
Up timeout to 300 min.
Correct history indexing error.
Temporary test to fast forward to cerebros model.
Comment out an artifact of GPT test so we this can lint and run.
Fix errors from trying to work too fast ...
Re-corrected the metrics BinaryAccuracy to correct AI introduced error.
Correct metric to rank by (binary accuracy) ...
Uncomment out GPT test ...
Upped number of trials to 5.
Make seq len 750, fix typo.
Try 1024 seq len.
Added branch to the workflow...
Added a positional embedding and a LayerNorm to the text embedding.
Missed position embedding in copy and paste ...
Synchronize embedding dim across embeddings.
Corrected import of PositionEmbedding.
Remove layernorm, concat instead of add.
Try addition to merge embeddings without LayerNorm
Restore optimal run with position embedding. Reduce max levels to fit the optimal run and reduce overhead. Test this to see if it works. if successful, add back the commented out comparison and PR. Then open an issue to optimize the params around this new model. We may need to run this on Katib to optimize the hyperparameters, as the model is fundamentally different than the original and can probably be optimized considerably.
Hard set levels to the known optimum.
Corrected hard set on levels to correct optima.
Restore the best model yet.
Add back the CICD test for image CLS. Prepare for PR.
Comment out workflows that we don't need in dev. Delete permanantly disused workflows
Made AdamW the default optimizer. We need to parameterize this and an optional hyperparameter for the weight_decay.
Test with default params with AdamW.
Combined best hyperparams from the hyperparameter optimization study with AdamW optimizer.
Add branch to workflow to make it start.
Add back all to be used workflows.
Added back the GPT baseline model for comparison.
Optimize NPL workflow for time's sake.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added optima from the hyperparameter optimization study on Mar 30 and replaced the default optimizer with AdamW.
Key changes (Phishing Detection NLP proof of concept):
Key Changes (Global)
Results (Phishing Detection NLP proof of concept):
Conclusions:
Next Steps:
Make a second hyperparameter optimization study using multivariate TPE which may find a better optima.
Optimize the weight decay for AdamW
Explore a larger embedding output dimensionality search space in the follow - up hyperparameter optimization study. We may be able to afford to go up to 50, 100 + ... We are at 30% - 40% memory pressure, and are completing epochs in under 2 min, so this can probably be expanded considerably before we run into the trade - off between time, memory, CPU requirements and the contribution to accuracy of higher dimension embeddings.
Add the AdamW weight_decay or the optimizer itself to the Cerebros init args.