Commit 3ee6c35
authored
Add support for checking hash of downloaded files before use. (#230)
We are using tiktoken in various production scenarios and sometimes have
the problem that the download of `.tiktoken` files (e.g.,
`cl100k_base.tiktoken`) will get interrupted or fail, causing the cached
file to be corrupted in some way. In those cases, the results returned
from the encoder will be incorrect and could be damaging to our
production instances.
More often, when this happens, `Encoder.encode()` will throw an
exception such as
```
pyo3_runtime.PanicException: no entry found for key
```
which turns out to be quite hard to track down.
In an effort to make tiktoken more robust for production use, this PR
adds the `sha256` hash of each of the downloaded files to
`openai_public.py` and augments `read_file` to check for the hash, if
provided, when the file is accessed from the cache or downloaded
directly. This causes errors to be flagged at file load time, rather
than when the files are used, and provides a more meaningful error
message indicating what might have gone wrong.
This also protects users of tiktoken from scenarios where a network
issue or MITM attack could have corrupted these files in transit.1 parent 9e79899 commit 3ee6c35
2 files changed
+34
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
30 | 36 | | |
31 | 37 | | |
32 | 38 | | |
| |||
45 | 51 | | |
46 | 52 | | |
47 | 53 | | |
48 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
49 | 61 | | |
50 | 62 | | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
51 | 68 | | |
52 | 69 | | |
53 | 70 | | |
| |||
64 | 81 | | |
65 | 82 | | |
66 | 83 | | |
67 | | - | |
| 84 | + | |
68 | 85 | | |
69 | 86 | | |
70 | 87 | | |
| |||
79 | 96 | | |
80 | 97 | | |
81 | 98 | | |
82 | | - | |
| 99 | + | |
83 | 100 | | |
84 | 101 | | |
85 | 102 | | |
| |||
96 | 113 | | |
97 | 114 | | |
98 | 115 | | |
99 | | - | |
| 116 | + | |
100 | 117 | | |
101 | 118 | | |
102 | 119 | | |
| |||
118 | 135 | | |
119 | 136 | | |
120 | 137 | | |
121 | | - | |
| 138 | + | |
122 | 139 | | |
123 | | - | |
| 140 | + | |
124 | 141 | | |
125 | 142 | | |
126 | 143 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
| 15 | + | |
14 | 16 | | |
15 | 17 | | |
16 | 18 | | |
| |||
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
26 | | - | |
| 28 | + | |
| 29 | + | |
27 | 30 | | |
28 | 31 | | |
29 | 32 | | |
| |||
36 | 39 | | |
37 | 40 | | |
38 | 41 | | |
39 | | - | |
| 42 | + | |
| 43 | + | |
40 | 44 | | |
41 | 45 | | |
42 | 46 | | |
| |||
49 | 53 | | |
50 | 54 | | |
51 | 55 | | |
52 | | - | |
| 56 | + | |
| 57 | + | |
53 | 58 | | |
54 | 59 | | |
55 | 60 | | |
| |||
62 | 67 | | |
63 | 68 | | |
64 | 69 | | |
65 | | - | |
| 70 | + | |
| 71 | + | |
66 | 72 | | |
67 | 73 | | |
68 | 74 | | |
| |||
0 commit comments