Hi, thanks for the great work on this project!
What's the difference between DETRIS-B and DETRIS-B (Default Setting) in Table 2 of the paper?
Is it due to the different number of adapter layers, or something else?
From the code, I see that the Dense Aligner is applied at layers [1, 3, 5, 7, 9, 11], and the Text Adapter is also applied at [1, 3, 5, 7, 9, 11].