Skip to content
Discussion options

You must be logged in to vote

Hi @victolee0

Thanks for the comment. Yes, theoretically a hidden size of 4096 is desirable, but I kept it to 768 to manage the computational requirements. In addition, my experiments with larger hidden sizes (e.g. 1536) did not show significant improvement, hence 768 seemed an optimal choice.

Best

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@victolee0
Comment options

Answer selected by victolee0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants