-
Notifications
You must be signed in to change notification settings - Fork 6
masking makes data have different shape, leading to stack problem #2
Description
Hi, thanks for sharing the codes!
There is a problem I couldn't solve in word2box-dev-shib/src/language_modeling_with_boxes/datasets/word2vecgpu.py.
In method __getitem__(self, idx), idx = idx.unsqueeze(1) + window_range.unsqueeze(0) raised AttributeError: 'int' object has no attribute 'unsqueeze' . I found idx is orginally an int. So I changed idx += self.pad_size into idx = torch.full(size=tuple([self.pad_size]),fill_value=idx) and solved this.
However, aftering getting context (tensor[10, 10]) and center (tensor[10, 1]) from corpus, it raises RuntimeError stack expects each tensor to be equal size. I supposed it's due to the difference between center and context, so I add center = center.unsqueeze(len(context.shape)-1) and center = center.expand_as(context), making center have the same shape as context.
Then comes a new problem: after using keep to get rid of some data, data gets different shape: tensor[x, 10] and x is an int between 1 and 10. This leads to Runtime Error again, because stack expects data has the same shape. I don't know how to handle this...Could you please offer some advice? Thank you so much!