Understanding 1MB per token calculation

I am finding the 1MB GPU ram usage per token while inferencing calculation a bit hard to understand --- also not what I am seeing in practice.

Any insights on how this number was computed ?