What is the reason of using Dense(2,activation='softmax') instead of Dense(1, activation='sigmoid')? Is it related to the Gradient Reversal Layer? If so, can you explain?