I am puzzled :
the entity mentions with the same description has been merged by the pooling operation before the prediction of the relationship, and the co-reference relation is about between the entity and its various ways of description : such as Apple and Apple Inc., right?
So how does the model know it if the data set has no such information?