Skip to content
Discussion options

You must be logged in to vote

Yes Evo2_7B saw all uppercased for the midtraining phase (the long context extension which is the final 0.3T), while the 7B base and 1B base models did not since they are below 3T. Note all models have weight tying between embedding and final projection layer

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@eyes-robson
Comment options

@garykbrixi
Comment options

Answer selected by eyes-robson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants