Skip to content

Conversation

@jlamypoirier
Copy link
Collaborator

✨ Description

@jlamypoirier jlamypoirier changed the base branch from main to jlp/vision_dataset November 7, 2025 01:45
@RaymondLi0
Copy link
Contributor

Training runs with identical configs (up to refactoring and assuming that default values have not changed, as far as I can tell) between jlp/vision_model and hybrid_dev, on a small training set.
The loss curves look close enough, although the one in hybrid_dev seems slightly lower:

Screenshot 2025-11-19 at 6 37 36 PM

Another question is: should we enforce some parts of the vision-encoder config like causal: false, cross_document_attention: False, rotary.type: default_2d ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants