Skip to content
This repository was archived by the owner on Dec 1, 2024. It is now read-only.

Commit f3ec2a7

Browse files
Update README.md (#88)
1 parent 20a9566 commit f3ec2a7

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -71,11 +71,11 @@ See [example](flexgen/apps/data_wrangle)
7171
The corresponding effective batch sizes are in parentheses. Please see [here](benchmark/batch_size_table.md) for more details.
7272
| System | OPT-6.7B | OPT-30B | OPT-175B |
7373
| ------ | -------- | ------- | -------- |
74-
| Hugging Face Accelerate | 25.12 (2 on gpu) | 0.62 (8 on cpu ) | 0.01 (2 on disk) |
75-
| DeepSpeed ZeRO-Inference | 9.28 (16 on cpu) | 0.60 (4 on cpu) | 0.01 (1 on disk) |
76-
| Petals | 8.25 | 2.84 | 0.08 |
77-
| FlexGen | 25.26 (2 on gpu) | 7.32 (144 on cpu) | 0.69 (256 on disk) |
78-
| FlexGen with Compression | **29.12** (72 on gpu) | **8.38** (512 on cpu) | **1.12** (144 on cpu) |
74+
| Hugging Face Accelerate | 25.12 (2 on GPU) | 0.62 (8 on CPU) | 0.01 (2 on disk) |
75+
| DeepSpeed ZeRO-Inference | 9.28 (16 on CPU) | 0.60 (4 on CPU) | 0.01 (1 on disk) |
76+
| Petals\* | - | - | 0.05 |
77+
| FlexGen | 25.26 (2 on GPU) | 7.32 (144 on CPU) | 0.69 (256 on disk) |
78+
| FlexGen with Compression | **29.12** (72 on GPU) | **8.38** (512 on CPU) | **1.12** (144 on CPU) |
7979

8080
- Hardware: an NVIDIA T4 (16GB) instance on GCP with 208GB of DRAM and 1.5TB of SSD.
8181
- Workload: input sequence length = 512, output sequence length = 32. The batch size is tuned to **a large value** that maximizes the generation throughput for each system.

0 commit comments

Comments
 (0)