-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Is the arpa-based KenLM the only type supported? Arpa models work fine for me but when I try using a bin type model I get the below error.
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-30-c9c4a045dd64> in <module>
5 alpha = 2.5 # LM Weight
6 beta = 0.0 # LM Usage Reward
----> 7 word_lm_scorer = ctcdecode.WordKenLMScorer('../lm_common_crawl_small_4gram_prun0-6-15_200kvocab.bin', alpha, beta) # use your own kenlm model
8 decoder = ctcdecode.BeamSearchDecoder(
9 vocabulary,
~/work/wav2vec/py-ctc-decode/ctcdecode/scorer.py in __init__(self, path, alpha, beta)
45 self.lm = kenlm.Model(path)
46
---> 47 self.words = self._get_words(path)
48 self.word_prefixes = self._get_word_prefixes(self.words)
49
~/work/wav2vec/py-ctc-decode/ctcdecode/scorer.py in _get_words(self, path)
107
108 while not end_1_gram:
--> 109 line = f.readline().strip()
110
111 if line == '\\1-grams:':
~/anaconda3/lib/python3.8/codecs.py in decode(self, input, final)
320 # decode input (taking the buffer into account)
321 data = self.buffer + input
--> 322 (result, consumed) = self._buffer_decode(data, self.errors, final)
323 # keep undecoded input until the next call
324 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 62: invalid start byte```
Metadata
Metadata
Assignees
Labels
No labels