You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Abstract = {Skyrocketing data volumes, growing hardware capabilities, and the revolution in machine learning (ML) theory have collectively driven the latest leap forward in ML. Despite our hope to realize the next leap with new hardware and a broader range of data, ML development is reaching scaling limits in both realms. First, the exponential surge in ML workload volumes and their complexity far outstrip hardware improvements, leading to hardware resource demands surpassing the sustainable growth of capacity. Second, the mounting volumes of edge data, increasing awareness of user privacy, and tightening government regulations render conventional ML practices, which centralize all data into the cloud, increasingly unsustainable due to escalating costs and scrutiny.
Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents a scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPack, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPack: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3–1.7X in Alibaba, 1.0–2.6X in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPack, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users. }
1905
1908
}
1909
+
1910
+
@InProceedings{openinfra:hotinfra24,
1911
+
author = {Jiaheng Lu and Yunming Xiao and Shmeelok Chakraborty and Silvery Fu and Yoon Sung Ji and Ang Chen and Mosharaf Chowdhury and Nalini Rao and Sylvia Ratnasamy and Xinyu Wang},
1912
+
title = {{OpenInfra}: A Co-simulation Framework for the Infrastructure Nexus},
Critical infrastructures like datacenters, power grids, and water systems are interdependent, forming complex "infrastructure nexuses" that require co-optimization for efficiency, resilience, and sustainability. We present OpenInfra, a co-simulation framework designed to model these interdependencies by integrating domain-specific simulators for datacenters, power grids, and cooling systems but focusing on stitching them together for end-to-end experimentation. OpenInfra enables seamless integration of diverse simulators and flexible configuration of infrastructure interactions. Our evaluation demonstrates its ability to simulate large-scale infrastructure dynamics, including 7,392 servers over 100+ hours.
1922
+
}
1923
+
}
1924
+
1925
+
@InProceedings{infa-finops:bigdata24,
1926
+
author = {Atam Prakash Agrawal and Anant Mittal and Shivangi Srivastava and Michael Brevard and Valentin Moskovich and Mosharaf Chowdhury},
1927
+
title = {{INFA-FinOps} for Cloud Data Integration},
Over the past decade, businesses have migrated to the cloud for its simplicity, elasticity, and resilience. Cloud ecosystems offer a variety of computing and storage options, enabling customers to choose configurations that maximize productivity. However, determining the right configuration to minimize cost while maximizing performance is challenging, as workloads vary and cloud offerings constantly evolve. Many businesses are overwhelmed with choice overload and often end up making suboptimal choices that lead to inflated cloud spending and/or poor performance.
1935
+
1936
+
In this paper, we describe INFA-FinOps, an automated system that helps Informatica customers strike a balance between cost efficiency and meeting SLAs for Informatica Advanced Data Integration (aka CDI-E) workloads. We first describe common workload patterns observed in CDI-E customers and show how INFA-FinOps selects optimal cloud resources and configurations for each workload, adjusting them as workloads and cloud ecosystems change. It also makes recommendations for actions that require user review or input. Finally, we present performance benchmarks on various enterprise use cases and conclude with lessons learned and potential future enhancements.
1937
+
}
1938
+
}
1939
+
1940
+
1941
+
@Article{mercury:arxiv24,
1942
+
author = {Jiaheng Lu and Yiwen Zhang and Hasan Al Maruf and Minseo Park and Yunxuan Tang and Fan Lai and Mosharaf Chowdhury},
1943
+
title = {{Mercury}: {QoS-Aware} Tiered Memory System},
Memory tiering has received wide adoption in recent years as an effective solution to address the increasing memory demands of memory-intensive workloads. However, existing tiered memory systems often fail to meet service-level objectives (SLOs) when multiple applications share the system because they lack Quality-of-Service (QoS) support. Consequently, applications suffer severe performance drops due to local memory contention and memory bandwidth interference.
1955
+
1956
+
In this paper, we present Mercury, a QoS-aware tiered memory system that ensures predictable performance for coexisting memory-intensive applications with different SLOs. Mercury enables per-tier page reclamation for application-level resource management and uses a proactive admission control algorithm to satisfy SLOs via per-tier memory capacity allocation and intra- and inter-tier bandwidth interference mitigation. It reacts to dynamic requirement changes via real-time adaptation. Extensive evaluations show that Mercury improves application performance by up to 53.4% and 20.3% compared to TPP and Colloid, respectively.
1957
+
}
1958
+
}
1959
+
1960
+
@InProceedings{autoiac:neurips24,
1961
+
author = {Patrick TJ Kon and Jiachen Liu and Yiming Qiu and Weijun Fan and Ting He and Lei Lin and Haoran Zhang and Owen M. Park and George Sajan Elengikal and Yuxin Kang and Ang Chen and Mosharaf Chowdhury and Myungjin Lee and Xinyu Wang},
1962
+
title = {{IaC-Eval}: A code generation benchmark for Infrastructure-as-Code programs},
Infrastructure-as-Code (IaC), an important component of cloud computing, allows the definition of cloud infrastructure in high-level programs. However, developing IaC programs is challenging, complicated by factors that include the burgeoning complexity of the cloud ecosystem (e.g., diversity of cloud services and workloads), and the relative scarcity of IaC-specific code examples and public repositories. While large language models (LLMs) have shown promise in general code generation and could potentially aid in IaC development, no benchmarks currently exist for evaluating their ability to generate IaC code. We present IaC-Eval, a first step in this research direction. IaC-Eval's dataset includes 458 human-curated scenarios covering a wide range of popular AWS services, at varying difficulty levels. Each scenario mainly comprises a natural language IaC problem description and an infrastructure intent specification. The former is fed as user input to the LLM, while the latter is a general notion used to verify if the generated IaC program conforms to the user's intent; by making explicit the problem's requirements that can encompass various cloud services, resources and internal infrastructure details. Our in-depth evaluation shows that contemporary LLMs perform poorly on IaC-Eval, with the top-performing model, GPT-4, obtaining a pass@1 accuracy of 19.36%. In contrast, it scores 86.6% on EvalPlus, a popular Python code generation benchmark, highlighting a need for advancements in this domain. We open-source the IaC-Eval dataset and evaluation framework at https://github.com/autoiac-project/iac-eval to enable future research on LLM-based IaC code generation.}
1971
+
}
1972
+
1973
+
@Article{mordal:arxiv25,
1974
+
author = {Shiqi He and Insu Jang and Mosharaf Chowdhury},
1975
+
title = {{Mordal}: Automated Pretrained Model Selection for Vision Language Models},
Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models.
1987
+
1988
+
We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using up to 8.9×-11.6× lower GPU hours than grid search. In the process of our evaluation, we have also discovered new VLMs that outperform their state-of-the-art counterparts.
1989
+
}
1990
+
}
1991
+
1992
+
@Article{curie:arxiv25,
1993
+
author = {Patrick Tser Jern Kon and Jiachen Liu and Qiuyi Ding and Yiming Qiu and Zhenning Yang and Yibo Huang and Jayanth Srinivasa and Myungjin Lee and Mosharaf Chowdhury and Ang Chen},
1994
+
title = {{Curie}: Toward Rigorous and Automated Scientific Experimentation with AI Agents},
Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4× improvement in correctly answering experimental questions. Curie is open-sourced at https://github.com/Just-Curieous/Curie.
2007
+
}
2008
+
}
2009
+
2010
+
@Article{cornstarch:arxiv25,
2011
+
author = {Insu Jang and Runyu Lu and Nikhil Bansal and Ang Chen and Mosharaf Chowdhury},
2012
+
title = {Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware },
Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio. However, this inherent heterogeneity in MLLM model structure and data types makes makeshift extensions to existing LLM training frameworks unsuitable for efficient MLLM training.
2025
+
In this paper, we present Cornstarch, the first general-purpose distributed MLLM training framework. Cornstarch facilitates modular MLLM construction, enables composable parallelization of constituent models, and introduces MLLM-specific optimizations to pipeline and context parallelism for efficient distributed MLLM training. Our evaluation shows that Cornstarch outperforms state-of-the-art solutions by up to 1.57x in terms of training throughput.
Copy file name to clipboardExpand all lines: source/_front.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ This boils down to three operating regimes: single-microsecond latency *within a
4
4
5
5

6
6
7
-
[**Join SymbioticLab**](https://forms.gle/L3Syau9dBzi8eLxQ7) to work on first-of-its-kind projects made possible by the [SymbioticLab cluster](/cluster/)!
7
+
<!--[**Join SymbioticLab**](https://forms.gle/L3Syau9dBzi8eLxQ7) to work on first-of-its-kind projects made possible by the [SymbioticLab cluster](/cluster/)!-->
8
8
9
9
[**Learn about openings in ongoing projects**](https://docs.google.com/document/d/1mVPqfnqLz_CXVz8XyDLcB1BIlUyezY2A-t5mV33JZHw/edit?usp=sharing) if you're already in Michigan.
0 commit comments