Skip to content
This repository was archived by the owner on Dec 1, 2024. It is now read-only.

Commit 4193ec6

Browse files
authored
Update links in the README (#94)
1 parent f3ec2a7 commit 4193ec6

File tree

1 file changed

+5
-7
lines changed

1 file changed

+5
-7
lines changed

README.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,8 @@ makes it easier to take advantage of low-cost commodity GPUs.
2121
The goal of FlexGen is to create a high-throughput system to enable new and exciting applications of
2222
foundation models to throughput-oriented tasks on low-cost hardware, such as a single commodity GPU
2323
instead of expensive systems.
24-
Here are some examples of high-throughput workloads that we can run _on a single commodity GPU_ with FlexGen:
25-
* *Benchmarking*: Running a subset of [HELM](https://crfm.stanford.edu/helm/latest/) benchmark.
26-
* *Data wrangling*: Running [data wrangling](https://arxiv.org/abs/2205.09911).
24+
25+
See [examples](#examples) for we can run _on a single commodity GPU_ with FlexGen, such as benchmarking and data wrangling.
2726

2827
**Limitation**. As an offloading-based system running on weak GPUs, FlexGen also has its limitations.
2928
FlexGen can be significantly slower than the case when you have enough powerful GPUs to hold the whole model, especially for small-batch cases.
@@ -54,24 +53,23 @@ pip install -e .
5453
```
5554

5655
## Examples
57-
5856
### HELM Benchmark
5957
FlexGen can be integrated into [HELM](https://crfm.stanford.edu/helm), a language model benchmark framework, as its execution backend.
6058
You can use the commands below to run a Massive Multitask Language Understanding (MMLU) [scenario](https://crfm.stanford.edu/helm/latest/?group=mmlu) with a single T4 (16GB) GPU and 200GB of DRAM.
6159
```
6260
python3 -m flexgen.apps.helm_run --description mmlu:model=text,subject=abstract_algebra,data_augmentation=canonical --pad-to-seq-len 512 --model facebook/opt-30b --percent 20 80 0 100 0 100 --gpu-batch-size 48 --num-gpu-batches 3 --max-eval-instance 100
6361
```
62+
Note that only a subset of HELM scenarios is tested.
6463

6564
### Data Wrangling
66-
67-
See [example](flexgen/apps/data_wrangle)
65+
You can run the examples in this paper, ['Can Foundation Models Wrangle Your Data?'](https://arxiv.org/abs/2205.09911), by following the instructions [here](flexgen/apps/data_wrangle).
6866

6967
## Performance Benchmark
7068
### Generation Throughput (token/s)
7169
The corresponding effective batch sizes are in parentheses. Please see [here](benchmark/batch_size_table.md) for more details.
7270
| System | OPT-6.7B | OPT-30B | OPT-175B |
7371
| ------ | -------- | ------- | -------- |
74-
| Hugging Face Accelerate | 25.12 (2 on GPU) | 0.62 (8 on CPU) | 0.01 (2 on disk) |
72+
| Hugging Face Accelerate | 25.12 (2 on GPU) | 0.62 (8 on CPU) | 0.01 (2 on disk) |
7573
| DeepSpeed ZeRO-Inference | 9.28 (16 on CPU) | 0.60 (4 on CPU) | 0.01 (1 on disk) |
7674
| Petals\* | - | - | 0.05 |
7775
| FlexGen | 25.26 (2 on GPU) | 7.32 (144 on CPU) | 0.69 (256 on disk) |

0 commit comments

Comments
 (0)