-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
hi team,
Thanks for great contribution to opensource community!
I ran under the instructions of Readme. However, I did not see real speedup across all batch size settings from 1 to 48, while degradation on throughput happened.
Do I have anything wrong?
Results
*hardware: A100 8x40G
Command Lines
for B in 1 16 32 48
do
echo "Running with B = $B" >> $LOG_FILE
ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=8 tests/SnapKV/selfspec_benchmark.py
--model $MODEL_PATH/model.pth
--model_name $MODEL_HF
--rank_group 0 1 2 3 4 5 6 7
--gamma 2
--B $B
--prefix_len 32032
--max_len 32288
--draft_budget 257
--benchmark
--compile >> $LOG_FILE 2>&1
done
Metadata
Metadata
Assignees
Labels
No labels
