Skip to content
Discussion options

You must be logged in to vote

Hello, thanks for your interest!

Yeah most of these NCBI genomes were constructed from short reads. Regions annotated as centromeres were excluded from the training data. We have not assessed the model on gapless assemblies, but I expect that Evo 2 should transfer well. Using finetuning to include additional data should also be a reasonable approach.

Note our context extension uses a mixture of genomes and also short genic focused data to maintain performance at both long and short context. This can be important and details of the data composition weight are in our preprint methods and supplement.

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@LauferVA
Comment options

Answer selected by LauferVA

This comment has been minimized.

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants