Hi Authors of Magic Dec,
I really found you paper very insightful and thought provoking!
I have a few doubts and was wondering if you could share your thoughts:
- the calculation of S inflection was done in a benchmark setup which had inflight batching (IFB) / continous batching or it was without it?
- With IFB, each step could be a mix of prefill and decode step so wondering if the results of S inflection would change?
- any pointers to the code of the benchmark setup or descriptions of how batching was happening?
Looking forward to hearing from you!