Skip to content

Generalized ia3#81

Open
IanMagnusson wants to merge 18 commits intomainfrom
generalized-ia3
Open

Generalized ia3#81
IanMagnusson wants to merge 18 commits intomainfrom
generalized-ia3

Conversation

@IanMagnusson
Copy link
Contributor

@IanMagnusson IanMagnusson commented Sep 14, 2022

What's Here

Moves a more generalized IA3 adaptor implementation to Tango (PR pending) and provides an example script for how to use it in Catwalk.

Results on piqa

While hardly impressive results, the IA3 implementation manages to reduce validation loss and recover much of the accuracy of the fully tuned equivalent for all the architectures for which default configurations are provided. The gpt-j-6b full tune is not able to run on a single gpu while the IA3 training is able to fit due to having far fewer optimizer states for its fewer trainable parameters.

Screen Shot 2022-09-13 at 6 57 54 PM

@IanMagnusson IanMagnusson marked this pull request as ready for review September 16, 2022 22:28
@IanMagnusson IanMagnusson requested a review from dirkgr September 16, 2022 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants