I'm developing an inference engine implementation for the Steerling model. My inference engine relies on an assumption that we auto-regressively generate blocks. This is a pretty common setup in prior works that aim to serve text diffusion models fast [1, 2].
In the Steerling code there is no autoregressive block-wise generation. I would like such a feature supported in order to better test the correctness (and speed gains) of my serving engine implementation.