Skip to content

Commit 32343b1

Browse files
committed
add correct scaling for byte
1 parent fd7d473 commit 32343b1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

model2vec/tokenizer/model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,4 @@ def _process_unigram(
4040
def _calculate_token_weight_for_unigram(token: str) -> float:
4141
"""Calculate the token weight for Unigram."""
4242
# Always prefer longer tokens.
43-
return len(token) + token.count("▁")
43+
return len(token) + token.count("▁") + token.count("Ġ")

0 commit comments

Comments
 (0)