Commit 9b9fdd0
authored
fix(parquet/pqarrow): decoding Parquet with Arrow dict in schema (apache#551)
Fix spurious `parquet: column chunk cannot have more than one
dictionary.` with specific parquet file
Resolve apache#546
Parquet with
* Arrow Dict column
* Arrow Schema serialied in Parquet Metadata
* ColumnChunks with 1 dict page + at least 2 Data page
# Bug
When maybeWriteNewDictionary() resets `newDictionary = false` at line
965, it causes the next call to readDictionary() to try to read the
dictionary page again from the pager, which then calls configureDict()
again, which throws the "cannot have more than one dictionary" error!
The sequence is:
1. Read DICTIONARY_PAGE → newDictionary = true
2. Read DATA_PAGE_1 → calls maybeWriteNewDictionary() → resets
newDictionary = false
3. Read DATA_PAGE_2 → calls readDictionary() → since newDictionary =
false, tries to get dictionary page again → calls configureDict() →
ERROR because decoder already exists
# Fix
Added DictionaryState enum (column_reader.go):
- DictNotRead: Dictionary page hasn't been read yet
- DictReadNotInserted: Dictionary page read and decoder configured, but
not inserted into Arrow builder
- DictFullyProcessed: Dictionary fully processed (read + inserted into
builder)1 parent 95b3f76 commit 9b9fdd0
File tree
3 files changed
+173
-8
lines changed- parquet
- file
- pqarrow
3 files changed
+173
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
41 | 54 | | |
42 | 55 | | |
43 | 56 | | |
| |||
160 | 173 | | |
161 | 174 | | |
162 | 175 | | |
163 | | - | |
| 176 | + | |
164 | 177 | | |
165 | 178 | | |
166 | 179 | | |
| |||
243 | 256 | | |
244 | 257 | | |
245 | 258 | | |
246 | | - | |
| 259 | + | |
247 | 260 | | |
248 | 261 | | |
249 | 262 | | |
| |||
286 | 299 | | |
287 | 300 | | |
288 | 301 | | |
289 | | - | |
| 302 | + | |
| 303 | + | |
290 | 304 | | |
291 | 305 | | |
292 | 306 | | |
| |||
324 | 338 | | |
325 | 339 | | |
326 | 340 | | |
327 | | - | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
328 | 345 | | |
329 | 346 | | |
330 | 347 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
953 | 953 | | |
954 | 954 | | |
955 | 955 | | |
956 | | - | |
957 | | - | |
958 | | - | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
959 | 959 | | |
960 | 960 | | |
961 | 961 | | |
962 | 962 | | |
963 | 963 | | |
964 | 964 | | |
965 | | - | |
| 965 | + | |
| 966 | + | |
966 | 967 | | |
967 | 968 | | |
968 | 969 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
0 commit comments