Commit 57f6261
### Rationale for this change
See GH-47861. With this change, the extension array variation takes ~192MB of memory instead of 7GB.
From what I can tell, this is because the `PandasOptions` struct is copied around frequently (for example it seems like there is an `ExtensionWriter` for each extension column and each `ExtensionWriter` has a copy of `PandasOptions` which has a set of all extension columns). I haven't fully traced the PandasOptions structure, but it seems to get copied and modified in some codepaths so I have decided to put the column sets into a `std::shared_ptr` rather than pass around a `shared_ptr<PandasOptions>`.
### What changes are included in this PR?
The `PandasOptions` column sets have been swapped from `std::unordered_set<std::string>` to `std::shared_ptr<const std::unordered_set<std::string>>` and usages have been updated.
### Are these changes tested?
Yes, no regression in the pytests. Also tested memory usage by hand.
### Are there any user-facing changes?
All changes are internal to the pyarrow C++ binding code. There are no changes to the exposed Python API.
* GitHub Issue: #47861
Lead-authored-by: Will Gulian <[email protected]>
Co-authored-by: Will Gulian <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
1 parent 88179b6 commit 57f6261
File tree
4 files changed
+33
-13
lines changed- python/pyarrow
- includes
- src/arrow/python
4 files changed
+33
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
198 | 198 | | |
199 | 199 | | |
200 | 200 | | |
201 | | - | |
202 | | - | |
| 201 | + | |
| 202 | + | |
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | | - | |
| 78 | + | |
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
| |||
2337 | 2337 | | |
2338 | 2338 | | |
2339 | 2339 | | |
2340 | | - | |
| 2340 | + | |
2341 | 2341 | | |
2342 | 2342 | | |
2343 | 2343 | | |
| |||
2458 | 2458 | | |
2459 | 2459 | | |
2460 | 2460 | | |
2461 | | - | |
| 2461 | + | |
2462 | 2462 | | |
2463 | 2463 | | |
2464 | 2464 | | |
| |||
2516 | 2516 | | |
2517 | 2517 | | |
2518 | 2518 | | |
2519 | | - | |
| 2519 | + | |
2520 | 2520 | | |
2521 | 2521 | | |
2522 | | - | |
| 2522 | + | |
2523 | 2523 | | |
2524 | 2524 | | |
2525 | 2525 | | |
| |||
2625 | 2625 | | |
2626 | 2626 | | |
2627 | 2627 | | |
2628 | | - | |
| 2628 | + | |
2629 | 2629 | | |
2630 | 2630 | | |
2631 | 2631 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
52 | 68 | | |
53 | 69 | | |
54 | 70 | | |
| |||
112 | 128 | | |
113 | 129 | | |
114 | 130 | | |
115 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
116 | 135 | | |
117 | 136 | | |
118 | 137 | | |
119 | | - | |
| 138 | + | |
120 | 139 | | |
121 | 140 | | |
122 | 141 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4083 | 4083 | | |
4084 | 4084 | | |
4085 | 4085 | | |
4086 | | - | |
| 4086 | + | |
| 4087 | + | |
4087 | 4088 | | |
4088 | | - | |
4089 | | - | |
| 4089 | + | |
| 4090 | + | |
4090 | 4091 | | |
4091 | 4092 | | |
4092 | 4093 | | |
| |||
0 commit comments