You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 1, 2024. It is now read-only.
See [examples](#examples) for we can run _on a single commodity GPU_ with FlexGen, such as benchmarking and data wrangling.
27
26
28
27
❌ **Limitation**. As an offloading-based system running on weak GPUs, FlexGen also has its limitations.
29
28
FlexGen can be significantly slower than the case when you have enough powerful GPUs to hold the whole model, especially for small-batch cases.
@@ -54,24 +53,23 @@ pip install -e .
54
53
```
55
54
56
55
## Examples
57
-
58
56
### HELM Benchmark
59
57
FlexGen can be integrated into [HELM](https://crfm.stanford.edu/helm), a language model benchmark framework, as its execution backend.
60
58
You can use the commands below to run a Massive Multitask Language Understanding (MMLU) [scenario](https://crfm.stanford.edu/helm/latest/?group=mmlu) with a single T4 (16GB) GPU and 200GB of DRAM.
Note that only a subset of HELM scenarios is tested.
64
63
65
64
### Data Wrangling
66
-
67
-
See [example](flexgen/apps/data_wrangle)
65
+
You can run the examples in this paper, ['Can Foundation Models Wrangle Your Data?'](https://arxiv.org/abs/2205.09911), by following the instructions [here](flexgen/apps/data_wrangle).
68
66
69
67
## Performance Benchmark
70
68
### Generation Throughput (token/s)
71
69
The corresponding effective batch sizes are in parentheses. Please see [here](benchmark/batch_size_table.md) for more details.
72
70
| System | OPT-6.7B | OPT-30B | OPT-175B |
73
71
| ------ | -------- | ------- | -------- |
74
-
| Hugging Face Accelerate | 25.12 (2 on GPU) | 0.62 (8 on CPU) | 0.01 (2 on disk) |
72
+
| Hugging Face Accelerate | 25.12 (2 on GPU) | 0.62 (8 on CPU) | 0.01 (2 on disk) |
75
73
| DeepSpeed ZeRO-Inference | 9.28 (16 on CPU) | 0.60 (4 on CPU) | 0.01 (1 on disk) |
76
74
| Petals\*| - | - | 0.05 |
77
75
| FlexGen | 25.26 (2 on GPU) | 7.32 (144 on CPU) | 0.69 (256 on disk) |
0 commit comments