Case Sensitivity of Models ? #166
-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @eyes-robson, we use Evo 2 with all uppercasing for all evals in the paper. Details on treatment of casing are in the methods
|
Beta Was this translation helpful? Give feedback.
Yes Evo2_7B saw all uppercased for the midtraining phase (the long context extension which is the final 0.3T), while the 7B base and 1B base models did not since they are below 3T. Note all models have weight tying between embedding and final projection layer