You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
schema: Specify the encoding for character offsets (#224)
This PR adds a new `PositionEncoding` field so that indexers can specify
what type of character offsets they are using. This way, consumers of SCIP
can unambiguously interpret the offsets for non-ASCII data.
I have kept it as a field on `Document` rather than `Index` because:
1. There is no additional benefit from having it on `Index` because
occurrences only belong inside Documents, not outside.
2. It allows one to concatenate indexes from different sources
which use different kinds of offsets.
0 commit comments