Skip to content

Commit ae34237

Browse files
committed
Add note on encoding offsets and UTF-8
1 parent 9b42cac commit ae34237

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

lib/tokenizers/encoding.ex

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ defmodule Tokenizers.Encoding do
4444

4545
@doc """
4646
Get offsets from an encoding.
47+
48+
The offsets are expressed in terms of UTF-8 bytes.
4749
"""
4850
@spec get_offsets(Encoding.t()) :: [{integer(), integer()}]
4951
def get_offsets(encoding), do: encoding |> Native.get_offsets() |> Shared.unwrap()

0 commit comments

Comments
 (0)