Great work!
Could you please clarify, how you provide the previous action sequence as an input condition during the first step of inference? If it is provided as a random noise sequence, doesn't it affect the generated action sequence for the subsequent steps, and the error propagates?