File tree Expand file tree Collapse file tree 1 file changed +3
-3
lines changed
Expand file tree Collapse file tree 1 file changed +3
-3
lines changed Original file line number Diff line number Diff line change @@ -64,18 +64,18 @@ To use VILA-HD models, please refer to [VILA-HD repo](https://github.com/NVlabs/
6464
6565## Performance
6666
67- ### Performance of PS3 models
67+ ### Comparing to other high-res encoding approaches such as AnyRes and S< sup >2</ sup >
6868
6969See Table 1 in the paper for full results.
7070
7171| Vision Model | Pre-Trained Weights | Max Resolution | # High-Res Token | TextVQA | ChartQA | DocVQA | InfoVQA | OCRBench | V* Bench | RealWorldQA | Avg |
7272| ---------------------| -------------------------------------------------------------------------| ----------------| ------------------| ---------| ---------| --------| ---------| ----------| ---------| -------------| ------|
7373| SigLIP | | 378 | 0 | 62.3 | 56.6 | 51.9 | 30.7 | 387 | 51.8 | 57.1 | 49.9 |
7474| SigLIP + AnyRes | | 1512 | 3136 | 67.4 | 58.4 | 67.9 | 34.1 | 468 | 60.2 | 59.0 | 56.3 |
75- | SigLIP + S2 | | 1512 | 2916 | 66.1 | 71.0 | 78.3 | 41.1 | 526 | 55.2 | 61.0 | 60.8 |
75+ | SigLIP + S< sup >2</ sup > | | 1512 | 2916 | 66.1 | 71.0 | 78.3 | 41.1 | 526 | 55.2 | 61.0 | 60.8 |
7676| ** PS3-1.5K-SigLIP** | [ nvidia/PS3-1.5K-SigLIP] ( https://huggingface.co/nvidia/PS3-1.5K-SigLIP ) | 1512 | 3645 | 69.3 | 71.1 | 79.4 | 41.3 | 534 | 64.0 | 63.8 | 63.2 |
7777| SigLIP + AnyRes | | 3780 | 19600 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
78- | SigLIP + S2 | | 3780 | 18225 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
78+ | SigLIP + S< sup >2</ sup > | | 3780 | 18225 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
7979| ** PS3-4K-SigLIP** | [ nvidia/PS3-4K-SigLIP] ( https://huggingface.co/nvidia/PS3-4K-SigLIP ) | 3780 | 3840 | 69.8 | 70.9 | 79.1 | 40.5 | 543 | 67.8 | 64.7 | 63.9 |
8080
8181
You can’t perform that action at this time.
0 commit comments