-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
I ran into this assertion error in get_offset function when using models other than from cl-tohoku in JaQuAD.ipynb:
assert unk_pointer is not None, \
'Normalized context and tokens are not matched'
I know this is something related to tokenization but I still can't quite figure it out even after going through the docstring:
'''The character-level start/end offsets of a token within a context.
Algorithm:
1. Make offsets of normalized context within the original context.
2. Make offsets of tokens (input_ids) within the normalized context.
Arguments:
input_ids -- Token ids of tokenized context (by tokenizer).
context -- String of context
tokenizer
norm_form
Return:
List[Tuple[int, int]]: Offsets of tokens within the input context.
For each token, the offsets are presented as a tuple of (start
position index, end position index). Both indices are inclusive.
'''
What is the motivation behind this function and in what circumstance would you need it?
Thanks.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels