Dataset too large

I am using the run_mlm.py file but I have my own copy because I changed where the tokenizer is going to since it is a different path from the model which is local. 

While intially working with this method, I used the first two lines of my dataset and it was working just fine, but now that I have expanded the input, I am getting this error:
```
IndexError                                Traceback (most recent call last)
Cell In[58], line 5
      3 scorer = MaskedLM('/data/user/home/nchendri/LongRun/')
      4 text =  dsMap['test']['text']
----> 5 ppl = scorer.get_perplexity(text, batch=32)
      6 print(ppl)
      7 print(list(zip(text, ppl)))

Cell In[57], line 162, in MaskedLM.get_perplexity(self, input_texts, batch)
    159     return _e
    161 if self.max_length is not None:
--> 162     data.append([encode_mask(i) for i in range(min(self.max_length - len(self.sp_token_prefix), len(x)))])
    163 else:
    164     data.append([encode_mask(i) for i in range(len(x))])

Cell In[57], line 162, in <listcomp>(.0)
    159     return _e
    161 if self.max_length is not None:
--> 162     data.append([encode_mask(i) for i in range(min(self.max_length - len(self.sp_token_prefix), len(x)))])
    163 else:
    164     data.append([encode_mask(i) for i in range(len(x))])

Cell In[57], line 157, in MaskedLM.get_perplexity.<locals>.encode_mask(mask_position)
    155 # add the correct token id as the label
    156 label = [PAD_TOKEN_LABEL_ID] * _e['input_ids'].shape[1]
--> 157 label[mask_position + len(self.sp_token_prefix)] = masked_token_id
    158 _e['labels'] = torch.tensor([label], dtype=torch.long)
    159 return _e

IndexError: list assignment index out of range
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset too large #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dataset too large #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions