Hi, I'd like to ask what changes are needed to distil knowledge between a teacher with a sigmoid function loss and a student with softmax loss?