Thanks for your interesting work, I believe that the project provides new theoretical analysis and insights about speculative decoding.
I would like to ask a question about the draft/target memory ratio. The paper shows that "the draft models can occupy up to 38∼140% memory footprint of target models", but I didn't find any equation related to this. I wanna to know how do you analysis it theoretically? Could you provide a specific equation?
