-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
I am using the run_mlm.py file but I have my own copy because I changed where the tokenizer is going to since it is a different path from the model which is local.
While intially working with this method, I used the first two lines of my dataset and it was working just fine, but now that I have expanded the input, I am getting this error:
IndexError Traceback (most recent call last)
Cell In[58], line 5
3 scorer = MaskedLM('/data/user/home/nchendri/LongRun/')
4 text = dsMap['test']['text']
----> 5 ppl = scorer.get_perplexity(text, batch=32)
6 print(ppl)
7 print(list(zip(text, ppl)))
Cell In[57], line 162, in MaskedLM.get_perplexity(self, input_texts, batch)
159 return _e
161 if self.max_length is not None:
--> 162 data.append([encode_mask(i) for i in range(min(self.max_length - len(self.sp_token_prefix), len(x)))])
163 else:
164 data.append([encode_mask(i) for i in range(len(x))])
Cell In[57], line 162, in <listcomp>(.0)
159 return _e
161 if self.max_length is not None:
--> 162 data.append([encode_mask(i) for i in range(min(self.max_length - len(self.sp_token_prefix), len(x)))])
163 else:
164 data.append([encode_mask(i) for i in range(len(x))])
Cell In[57], line 157, in MaskedLM.get_perplexity.<locals>.encode_mask(mask_position)
155 # add the correct token id as the label
156 label = [PAD_TOKEN_LABEL_ID] * _e['input_ids'].shape[1]
--> 157 label[mask_position + len(self.sp_token_prefix)] = masked_token_id
158 _e['labels'] = torch.tensor([label], dtype=torch.long)
159 return _e
IndexError: list assignment index out of range
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels