Skip to content

Commit e1edbbf

Browse files
authored
add Mordal arXiv (#296)
* add Mordal arXiv * fix Mordal arxiv * fix Mordal arxiv bib format
1 parent e1d240f commit e1edbbf

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

source/_data/SymbioticLab.bib

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1968,4 +1968,23 @@ @InProceedings{autoiac:neurips24
19681968
publist_link = {code || https://github.com/autoiac-project/iac-eval},
19691969
publist_abstract = {
19701970
Infrastructure-as-Code (IaC), an important component of cloud computing, allows the definition of cloud infrastructure in high-level programs. However, developing IaC programs is challenging, complicated by factors that include the burgeoning complexity of the cloud ecosystem (e.g., diversity of cloud services and workloads), and the relative scarcity of IaC-specific code examples and public repositories. While large language models (LLMs) have shown promise in general code generation and could potentially aid in IaC development, no benchmarks currently exist for evaluating their ability to generate IaC code. We present IaC-Eval, a first step in this research direction. IaC-Eval's dataset includes 458 human-curated scenarios covering a wide range of popular AWS services, at varying difficulty levels. Each scenario mainly comprises a natural language IaC problem description and an infrastructure intent specification. The former is fed as user input to the LLM, while the latter is a general notion used to verify if the generated IaC program conforms to the user's intent; by making explicit the problem's requirements that can encompass various cloud services, resources and internal infrastructure details. Our in-depth evaluation shows that contemporary LLMs perform poorly on IaC-Eval, with the top-performing model, GPT-4, obtaining a pass@1 accuracy of 19.36%. In contrast, it scores 86.6% on EvalPlus, a popular Python code generation benchmark, highlighting a need for advancements in this domain. We open-source the IaC-Eval dataset and evaluation framework at https://github.com/autoiac-project/iac-eval to enable future research on LLM-based IaC code generation.}
1971-
}
1971+
}
1972+
1973+
@Article{mordal:arxiv25,
1974+
author = {Shiqi He and Insu Jang and Mosharaf Chowdhury},
1975+
title = {{Mordal}: Automated Pretrained Model Selection for Vision Language Models},
1976+
year = {2025},
1977+
month = {Feb},
1978+
volume = {abs/2502.00241},
1979+
archivePrefix = {arXiv},
1980+
eprint = {2502.00241},
1981+
url = {https://arxiv.org/abs/2502.00241},
1982+
publist_confkey = {arXiv:2502.00241},
1983+
publist_link = {paper || https://arxiv.org/abs/2502.00241},
1984+
publist_topic = {Systems + AI},
1985+
publist_abstract = {
1986+
Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models.
1987+
1988+
We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using up to 8.9×-11.6× lower GPU hours than grid search. In the process of our evaluation, we have also discovered new VLMs that outperform their state-of-the-art counterparts.
1989+
}
1990+
}

0 commit comments

Comments
 (0)