Vision model #384

jlamypoirier · 2025-11-07T01:45:25Z

✨ Description

RaymondLi0 · 2025-11-19T23:46:05Z

Training runs with identical configs (up to refactoring and assuming that default values have not changed, as far as I can tell) between jlp/vision_model and hybrid_dev, on a small training set.
The loss curves look close enough, although the one in hybrid_dev seems slightly lower:

Another question is: should we enforce some parts of the vision-encoder config like causal: false, cross_document_attention: False, rotary.type: default_2d ?

jlamypoirier added 2 commits November 6, 2025 16:29

Vision model

f5398b3

misc

10938de

jlamypoirier changed the base branch from main to jlp/vision_dataset November 7, 2025 01:45

jlamypoirier added 16 commits November 6, 2025 20:48

cleanup

e4f3f02

Merge branch 'jlp/vision_dataset' into jlp/vision_model

74dcd98

stuff

9d2fe10

Merge branch 'jlp/vision_dataset' into jlp/vision_model

56bddd5

rotary

358abe1

fix

5aed2a7

Merge branch 'jlp/vision_dataset' into jlp/vision_model

2858b8b

fixes

57d8c1f

Fix and test backup attention

f9da7b3

fixes

81c29d0

fix

a61121a

fix

c2786a2

fix

b0fbaf5

fix

571e527

fix

e11ee91

fix

67d4a7c

jlamypoirier added 2 commits November 20, 2025 09:39

fix

13701b8

Merge branch 'jlp/vision_dataset' into jlp/vision_model

b007b31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vision model #384

Vision model #384

Uh oh!

jlamypoirier commented Nov 7, 2025

Uh oh!

RaymondLi0 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vision model #384

Are you sure you want to change the base?

Vision model #384

Uh oh!

Conversation

jlamypoirier commented Nov 7, 2025

✨ Description

Uh oh!

RaymondLi0 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants