In your paper, you have the following diagram, which seems to suggest that ID regularization is done prior to the last layer of the decoder.

However, in the paper, you also mention: "The ID regularization was applied to the final decoder layer, which uses a rectified linear unit." So is the ID regularization applied to the last layer of the layer before it?