Skip to content

Commit 20ace1d

Browse files
authored
fix Mordal arxiv bib format
1 parent 46ef15c commit 20ace1d

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

source/_data/SymbioticLab.bib

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1957,21 +1957,21 @@ @Article{mercury:arxiv24
19571957
}
19581958
}
19591959
1960-
19611960
@Article{mordal:arxiv25,
1962-
author = {Shiqi He and Insu Jang and Mosharaf Chowdhury},
1963-
title = {{Mordal}: Automated Pretrained Model Selection for Vision Language Models},
1964-
year          = {2025},
1965-
month         = {Feb},
1966-
volume        = {abs/2502.00241},
1967-
archiveprefix = {arXiv},
1968-
eprint        = {2502.00241},
1969-
url           = {https://arxiv.org/abs/2502.00241},
1961+
author = {Shiqi He and Insu Jang and Mosharaf Chowdhury},
1962+
title = {{Mordal}: Automated Pretrained Model Selection for Vision Language Models},
1963+
year = {2025},
1964+
month = {Feb},
1965+
volume = {abs/2502.00241},
1966+
archivePrefix = {arXiv},
1967+
eprint = {2502.00241},
1968+
url = {https://arxiv.org/abs/2502.00241},
19701969
publist_confkey = {arXiv:2502.00241},
1971-
publist_link = {paper || https://arxiv.org/abs/2502.00241},
1972-
publist_topic = {Systems + AI},
1970+
publist_link = {paper || https://arxiv.org/abs/2502.00241},
1971+
publist_topic = {Systems + AI},
19731972
publist_abstract = {
1974-
Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models.
1975-
We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using up to 8.9×-11.6× lower GPU hours than grid search. In the process of our evaluation, we have also discovered new VLMs that outperform their state-of-the-art counterparts.
1973+
Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models.
1974+
1975+
We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using up to 8.9×-11.6× lower GPU hours than grid search. In the process of our evaluation, we have also discovered new VLMs that outperform their state-of-the-art counterparts.
19761976
}
19771977
}

0 commit comments

Comments
 (0)