What does get_offset function do? (Insufficient info in docs)

I ran into this assertion error in `get_offset` function when using models other than from `cl-tohoku` in `JaQuAD.ipynb`:
```
assert unk_pointer is not None, \
                'Normalized context and tokens are not matched'
```

I know this is something related to tokenization but I still can't quite figure it out even after going through the docstring:
```
'''The character-level start/end offsets of a token within a context.
    Algorithm:
    1. Make offsets of normalized context within the original context.
    2. Make offsets of tokens (input_ids) within the normalized context.

    Arguments:
    input_ids -- Token ids of tokenized context (by tokenizer).
    context -- String of context
    tokenizer
    norm_form

    Return:
        List[Tuple[int, int]]: Offsets of tokens within the input context.
        For each token, the offsets are presented as a tuple of (start
        position index, end position index). Both indices are inclusive.
    '''
```
What is the motivation behind this function and in what circumstance would you need it?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does get_offset function do? (Insufficient info in docs) #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What does get_offset function do? (Insufficient info in docs) #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions