You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
148958: colserde,rowcontainer: prohibit writing very large keys r=yuzefovich a=yuzefovich
We just saw a node crash in a test when we wrote 2.5 GiB key to the temporary storage used by the row container. Such large keys aren't well supported by pebble and can lead to undefined behavior, so we add an explicit check that the key doesn't exceed 1.5 GiB. We also now will lose scratch slices once they exceed 1 MiB in size (we already have memory accounting in place for them).
Similarly, the vectorized disk spilling could suffer from the same problem since in the arrow format offsets are int32, so we if were to serialize a vector of more than 2 GiB in size, we'd encounter undefined behavior (which we've seen a couple of times in sentry issues). This commit adds an explicit check there as well returning an error if the serialized size exceeds max int32. Additionally, we now will lose references to the large scratch slice that we keep across the calls once it exceeds 32 MiB. Note that I initially added a simple unit test that allocated a vector of 3 GiB size and ensured that an error is returned, but it hits an OOM on EngFlow environment, and it doesn't seem worth upgrading it to the heavy pool, so I removed it.
In the test failure such a large value was produced via `st_collect` geo builtin. Another example I can think of would be an array created via `array_agg` argument.
Fixes: #147601.
Release note: None
Co-authored-by: Yahor Yuzefovich <[email protected]>
0 commit comments