forked from kimiyoung/transformer-xl
-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
After switching to pytorch_april_patched and installing -r requirements.txt
Producing dataset wiki...
encoding file testdata/wikiextracted/AA/wiki_01.txt ...
Traceback (most recent call last):
File "train.py", line 1036, in <module>
eval(f'test_{g.args.test}()')
File "<string>", line 1, in <module>
File "train.py", line 940, in test_checkpoint_wiki
data_setup()
File "train.py", line 333, in data_setup
g.corpus = get_lm_corpus(g.args.data, g.args.dataset, use_bpe=g.args.bpe)
File "/home/ubuntu/data_utils.py", line 381, in get_lm_corpus
corpus = Corpus(datadir, dataset, use_bpe, **kwargs)
File "/home/ubuntu/data_utils.py", line 309, in __init__
self.valid = self.vocab.encode_file(valid_path, ordered=True)
File "/home/ubuntu/utils/vocabulary.py", line 204, in encode_file
tokens: List[int] = self.tokenizer.encode(text) + [self.EOT]
File "/home/ubuntu/anaconda3/envs/pytorch_april_patched/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization_gpt2.py", line 261, in encode
return self.convert_tokens_to_ids(self.tokenize(text))
File "/home/ubuntu/anaconda3/envs/pytorch_april_patched/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization_gpt2.py", line 224, in tokenize
token = ''.join(self.byte_encoder[ord(b)] for b in token)
File "/home/ubuntu/anaconda3/envs/pytorch_april_patched/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization_gpt2.py", line 224, in <genexpr>
token = ''.join(self.byte_encoder[ord(b)] for b in token)
KeyError: 8212
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels