【consult🙋♂️】some problem about model&data #131
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
TITC
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I know you recommend CNN backbone at some suggestions #112 (comment) ,but the author suggests, the original ViT is better than Hybrid except the small model size, so I have some curiosity.
Have you experimented with some other position embedding strategy? If I haven't misunderstood your meaning at 【consult🙋♂️】 some confusion about CustomVisionTransformer.forward_features #130 (comment), I think the position embedding manner is different with ViT suggest 1D position embedding.
How many data you used for training the released model? you mentioned wikipedia, arXiv, im2latex-100k dataset, and provide a pre-processed data link. But the number of LaTex list below not equal to the number of picture, so I guess maybe your training set is larger than released?
CROHME.zip
10822 handwritten dataset(CROHME) and 10846 line LaTex formula.
formula.zip
234884 line LaTex formula, 158480 for train, 6765 for val, and 30637 for test.
I have found some other dataset, such as I2L-140K, Im2latex-90k, Marmot Dataset, have you used them?
Beta Was this translation helpful? Give feedback.
All reactions