You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To address these challenges, we introduce Memstrata, a lightweight multi-tenant memory allocator. Memstrata employs page coloring to eliminate inter-VM contention. It improves performance for VMs with access patterns that are sensitive to hardware tiering by allocating them more local DRAM using an online slowdown estimator. In multi-VM experiments on prototype hardware, Memstrata is able to identify performance outliers and reduce their degradation from above 30% to below 6%, providing consistent performance across a wide range of workloads.
1811
1811
}
1812
-
}
1812
+
}
1813
+
1814
+
@article{fed-ensemble:ieee-tase,
1815
+
author = {Naichen Shi and Fan Lai and Raed Al Kontar and Mosharaf Chowdhury},
1816
+
title = {Fed-ensemble: Ensemble Models in Federated Learning for Improved Generalization and Uncertainty Quantification},
1817
+
journal = {IEEE Transactions on Automation Science and Engineering},
The increase in the computational power of edge devices has opened up the possibility of processing some of the data at the edge and distributing model learning. This paradigm is often called federated learning (FL), where edge devices exploit their local computational resources to train models collaboratively. Though FL has seen recent success, it is unclear how to characterize uncertainties in FL predictions. In this paper, we propose Fed-ensemble : a simple approach that brings model ensembling to FL. Instead of aggregating local models to update a single global model, Fed-ensemble uses random permutations to update a group of $K$ models and then obtains predictions through model averaging. Fed-ensemble can be readily utilized within established FL methods and does not impose a computational overhead compared with single-model methods. Empirical results show that our model has superior performance over several FL algorithms on a wide range of data sets and excels in heterogeneous settings often encountered in FL applications. Also, by carefully choosing client-dependent weights in the inference stage, Fed-ensemble becomes personalized and yields even better performance. Theoretically, we show that predictions on new data from all $K$ models belong to the same predictive posterior distribution under a neural tangent kernel regime. This result, in turn, sheds light on the generalization advantages of model averaging and justifies the uncertainty quantification capability. We also illustrate that Fed-ensemble has an elegant Bayesian interpretation. Note to Practitioners —provides an algorithm that extracts a set of $K$ solutions without imposing any additional communication overhead in FL. Given multiple solutions, Fed-ensemble can be exploited to personalize inference as well as quantify uncertainty. Such capabilities may be beneficial within multiple practical systems that require uncertainty-aware decision-making. Further, Fed-ensemble may be useful for model validation and hypothesis testing.
1825
+
}
1826
+
}
1827
+
1828
+
@article{llm-survey:tmlr,
1829
+
author = {Zhongwei Wan and Xin Wang and Che Liu and Samiul Alam and Yu Zheng and Jiachen Liu and Zhongnan Qu and Shen Yan and Yi Zhu and Quanlu Zhang and Mosharaf Chowdhury and Mi Zhang},
1830
+
title = {Efficient Large Language Models: A Survey},
1831
+
journal = {Transactions on Machine Learning Research},
Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding and language generation, and thus have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency challenges. In this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey. We will actively maintain the repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient LLMs research and inspire them to contribute to this important and exciting field.}
1839
+
}
1840
+
1841
+
@article{pyxis:tpds,
1842
+
author = {Sheng Qi and Chao Jin and Mosharaf Chowdhury and Zhenming Liu and Xuanzhe Liu and Xin Jin},
1843
+
title = {Pyxis: Scheduling Mixed Tasks in Disaggregated Datacenters},
1844
+
journal = {IEEE Transactions on Parallel and Distributed Systems},
Disaggregating compute from storage is an emerging trend in cloud computing. Effectively utilizing resources in both compute and storage pool is the key to high performance. The state-of-the-art scheduler provides optimal scheduling decisions for workloads with homogeneous tasks. However, cloud applications often generate a mix of tasks with diverse compute and IO characteristics, resulting in sub-optimal performance for existing solutions. We present Pyxis, a system that provides optimal scheduling decisions for mixed workloads in disaggregated datacenters with theoretical guarantees. Pyxis is capable of maximizing overall throughput while meeting latency SLOs. Pyxis decouples the scheduling of different tasks. Our insight is that the optimal solution has an “all-or-nothing” structure that can be captured by a single turning point in the spectrum of tasks. Based on task characteristics, the turning point partitions the tasks either all to storage nodes or all to compute nodes (none to storage nodes). We theoretically prove that the optimal solution has such a structure, and design an online algorithm with sub-second convergence. We implement a prototype of Pyxis. Experiments on CloudLab with various synthetic and application workloads show that Pyxis improves the throughput by 3–21× over the state-of-the-art solution.}
0 commit comments