Commit d8df6c9
Add sorting support (#20798)
* Adding sort support during merge operation
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
* Add sorting support
* Fix sort flow
* Fix date field type for sort in merge
* Renamed @timestamp to timestamp
* Changing sort to EventDate
* Fix memory issue for larger merge
* Using polars for merging
* Added hybrid approach of polars and heap
* optimize merge by grouping contiguous rows
* merge with streaming using sink_batches
* streaming k way merge
* using sink_batch to stream one batch per file
* optimize - materialize chunks at flush time
* replace threaded sink_batches with sequential ParquetReader slicing to reduce memory usage
* optimize - binary search to find cut point in batch + emit entire batch by checking last row in batch
* use arrow writer - ipc
* use rayon to paralleize writes across columns
* add tokio for parallel column writes
* add rayon with thread pool for column encoding during flush
* add tokio for async write and rayon parallel decoding during reads through the shared rayon thread pool
* temp commit
* take sort column from index setting during indexing flow
* support reverse sort in merge
* store and look up settings for every index from settings store
* refactor and address comments
* add exception handling + take union of all schema to make code extensible for dynamic mapping support
* add profiler
---------
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Co-authored-by: Dhwanil Patel <dhwanip@amazon.com>
Co-authored-by: Mohit Godwani <mgodwan@amazon.com>
Co-authored-by: Shailesh Singh <shaikumm@amazon.com>
Co-authored-by: Shailesh Singh <shaileshkumarsingh260@gmaill.com>1 parent d60b7ae commit d8df6c9
File tree
30 files changed
+2350
-592
lines changed- modules/parquet-data-format
- benchmarks/src/main/java/com/parquet/parquetdataformat/benchmark
- src
- main
- java/com/parquet/parquetdataformat
- bridge
- engine
- merge
- vsr
- writer
- rust
- src
- bin
- tests
- test/java/com/parquet/parquetdataformat/vsr
- server/src/main/java/org/opensearch/index
- engine/exec
- composite
- coord
- merge
30 files changed
+2350
-592
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
| 84 | + | |
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
| 42 | + | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| |||
Lines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
| 28 | + | |
| 29 | + | |
27 | 30 | | |
28 | 31 | | |
29 | | - | |
| 32 | + | |
30 | 33 | | |
31 | | - | |
| 34 | + | |
32 | 35 | | |
33 | 36 | | |
34 | 37 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
46 | 46 | | |
Lines changed: 14 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| 80 | + | |
| 81 | + | |
80 | 82 | | |
81 | 83 | | |
82 | 84 | | |
| |||
90 | 92 | | |
91 | 93 | | |
92 | 94 | | |
93 | | - | |
| 95 | + | |
94 | 96 | | |
95 | 97 | | |
96 | 98 | | |
| |||
108 | 110 | | |
109 | 111 | | |
110 | 112 | | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
111 | 123 | | |
112 | 124 | | |
113 | 125 | | |
| |||
155 | 167 | | |
156 | 168 | | |
157 | 169 | | |
158 | | - | |
| 170 | + | |
159 | 171 | | |
160 | 172 | | |
161 | 173 | | |
| |||
Lines changed: 4 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | | - | |
14 | | - | |
15 | 13 | | |
16 | 14 | | |
17 | 15 | | |
18 | 16 | | |
19 | 17 | | |
20 | 18 | | |
21 | | - | |
22 | 19 | | |
23 | | - | |
| 20 | + | |
24 | 21 | | |
25 | | - | |
26 | 22 | | |
27 | 23 | | |
28 | 24 | | |
29 | | - | |
30 | | - | |
| 25 | + | |
| 26 | + | |
31 | 27 | | |
32 | 28 | | |
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
| 12 | + | |
12 | 13 | | |
13 | | - | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
Lines changed: 2 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | | - | |
14 | 13 | | |
15 | 14 | | |
16 | 15 | | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | 16 | | |
21 | 17 | | |
22 | | - | |
| 18 | + | |
23 | 19 | | |
24 | 20 | | |
25 | 21 | | |
0 commit comments