Question about topk distillation

In SDPO, topk tokens from the **student model** are used to compute KL. However, in Openclaw-RL, topk tokens from the **teacher model** are used to compute KL. I wonder which one shall we follow and why?