Commit a6a0ba3
[SW-238029] [1.22]Fix max_batch_size handling - Lllama perf degradation fix (#1828)
Llama Perf degradation seen with Gemma3 suport:
#1616.
: max_batch_size was initialized incorrectly for the profile_run due to
mm_registry checking instead of actual multimodal models. Fix to only
initialized to 1 when multimodal(mrope or mm_optimized) model is in use.
## Test Result
Llama v3.1 70B 2048/128 BF16 2xcard - perf drop 170 tps to 150 tps.
With this fix, it's back to 170tps
---------
Co-authored-by: Iryna Boiko <[email protected]>1 parent 8fad535 commit a6a0ba3
1 file changed
+6
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1491 | 1491 | | |
1492 | 1492 | | |
1493 | 1493 | | |
1494 | | - | |
1495 | | - | |
1496 | | - | |
1497 | | - | |
1498 | | - | |
| 1494 | + | |
| 1495 | + | |
| 1496 | + | |
| 1497 | + | |
1499 | 1498 | | |
1500 | 1499 | | |
1501 | 1500 | | |
| |||
2804 | 2803 | | |
2805 | 2804 | | |
2806 | 2805 | | |
2807 | | - | |
2808 | | - | |
2809 | 2806 | | |
2810 | 2807 | | |
| 2808 | + | |
| 2809 | + | |
2811 | 2810 | | |
2812 | 2811 | | |
2813 | 2812 | | |
| |||
0 commit comments