I am finding the 1MB GPU ram usage per token while inferencing calculation a bit hard to understand --- also not what I am seeing in practice. Any insights on how this number was computed ?