Skip to content

Commit 7e50fcc

Browse files
committed
Update
1 parent 4f98fe0 commit 7e50fcc

File tree

2 files changed

+3
-4
lines changed

2 files changed

+3
-4
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ Please check the rest of this page about benchmark of LLaMA family models.
2222
### Mixtral 8x7B
2323
We also supported [Mixtral 8x7B](https://mistral.ai/news/mixtral-of-experts/) which is a high-quality sparse mixture of experts (MoE) model, the average token generation rates are:
2424

25-
| | 1 GPU | 2 GPU | 4 GPU | 8 GPU |
25+
| | 1 GPU | 2 GPU | 4 GPU | 8 GPU |
2626
|------------------|---------|-----------|--------|------------|
27-
|baseline(bfloat16)| OOM | 78.75 | 118.23 | 203.69 |
28-
| int8 | 56.04 | 99.91 | 149.53 | 218.48 |
27+
|baseline(bfloat16)| OOM | 96.67 | 155.35 | 227.82 |
28+
| int8 | 97.92 | 155.03 | 216.87 | 279.35 |
2929

3030
Note that the benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh topology. Note that all benchmarks are run at *batch size=1*, making the reported tokens/s numbers equivalent to "tokens/s/user". In addition, they are run with a very small prompt length (just 5 tokens).
3131

mixtral-moe/model.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,6 @@ def forward(self, x: Tensor, freqs_cis: Tensor, mask: Tensor, input_pos: Optiona
183183
y = self.wo(y)
184184
return y
185185

186-
import torch.distributed
187186

188187
class ConditionalFeedForward(nn.Module):
189188
def __init__(self, config):

0 commit comments

Comments
 (0)