Skip to content

Commit b2a09ed

Browse files
author
Max Hniebergall
committed
Fix deberta tokenizer bug caused by bug in normalizer which caused offesets to be negative
1 parent fe7818a commit b2a09ed

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/nlp/tokenizers/PrecompiledCharMapNormalizer.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ Reader normalize(CharSequence str) {
194194
if (charDelta < 0) {
195195
// normalised form is shorter
196196
int lastDiff = getLastCumulativeDiff();
197-
addOffCorrectMap(normalizedCharPos, lastDiff + charDelta);
197+
addOffCorrectMap(normalizedCharPos, lastDiff - charDelta);
198198
} else if (charDelta > 0) {
199199
// inserted chars, add the offset in the output stream
200200
int lastDiff = getLastCumulativeDiff();

0 commit comments

Comments
 (0)