You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The energy consumption of computing systems is increasing with the rising popularity of Big Data and AI.
25
+
While the hardware community has invested considerable effort in energy optimizations, we observe that similar efforts on the software side are significantly lacking.
26
+
[Our initiative](https://ml.energy) to understand and optimize the energy consumption of modern AI workloads is exposing new ways to understand energy consumption from software.
27
+
Major projects include [Zeus](https://ml.energy/zeus), the first GPU energy-vs-training performance tradeoff optimizer for DNN training.
Modern datacenters often overprovision application memory to avoid performance cliffs, leading to 50% underutilization on average.
25
31
Our research addresses this fundamental problem via practical memory disaggregation, whereby an application can leverage both local and remote memory by leveraging high-speed networks, and more recently with emerging CXL technology.
26
32
We are building systems that can ensure a disaggregated system with 100s of nanoseconds latency.
27
33
We are generally interested in disaggregating all resources for fully utilized datacenters.
28
34
Major projects include [Infiniswap](https://infiniswap.github.io/), the first practical memory disaggregation software, and [TPP](https://arxiv.org/abs/2206.02878).
Collecting voluminous remote data to a central location not only presents a bandwidth and storage problem but increasingly is likely to violate privacy regulations such as General Data Protection Regulation (GDPR).
34
39
In these settings, data systems must minimize communication instead.
35
40
We are developing systems, algorithms, and benchmarks to analyze data distributed across multiple cloud datacenters and end-user devices to enable geo-distributed/federated learning and analytics.
36
41
Major projects include [FedScale](https://fedscale.ai/), the largest benchmark and a scalable and extensible platform for federated learning.
The energy consumption of computing systems is increasing with the rising popularity of Big Data and AI.
41
-
While the hardware community has invested considerable effort in energy optimizations, we observe that similar efforts on the software side are significantly lacking.
42
-
[Our initiative](https://ml.energy) to understand and optimize the energy consumption of modern AI workloads is exposing new ways to understand energy consumption from software.
43
-
Major projects include [Zeus](https://ml.energy/zeus), the first GPU energy-vs-training performance tradeoff optimizer for DNN training.
We also work on network resource management schemes to isolate Big Data and AI systems at the edge and inside the datacenter network.
48
45
Our recent focus has primarily been on emerging networking technologies such as low-latency RDMA-enabled networks, programmable switches, and SmartNICs.
49
46
We are also interested in improving the existing networking infrastructure such as improving QoS for low-latency RPCs in datacenters.
50
47
Major projects include [Aequitas](https://github.com/SymbioticLab/Aequitas) and [Justitia](https://github.com/SymbioticLab/Justitia).
51
48
52
-
53
49
## [Big Data Systems](/publications/#/topic:Big%20Data%20Systems)
54
50
In the recent past, we worked on designing and improving big data systems via new algorithms for resource scheduling, caching data in memory, and dynamic query planning to improve resource efficiency, application performance, and fairness.
0 commit comments