Commit d9f36a1
authored
### Rationale for this change
This PR resolves the issue reported in #47301.
There are [three possible file source types](https://github.com/apache/arrow/blob/80addfab90b65c9127b46cc5c0ff48af4db1afb3/python/pyarrow/_dataset.pyx#L104) in which a `CFileSource` can be created:
1. From a `pa.Buffer`.
2. From a `path` string.
3. From a file-like object which has a `read` attribute.
However, `FileFragment.open()` currently only [explicitly handles the first two types](https://github.com/apache/arrow/blob/80addfab90b65c9127b46cc5c0ff48af4db1afb3/python/pyarrow/_dataset.pyx#L2005). When `open` is called with a `FileFragment` created from type (3), the current implementation tries to read the `path` which is set to a string called `"<Buffer>"` ([source](https://github.com/apache/arrow/blob/135357ce3824d1a8e1aba5a19d897b0c02b22ab7/cpp/src/arrow/dataset/file_base.h#L106)). This causes the seg fault as observed in the linked issue.
### What changes are included in this PR?
1. Modify `FileFragment.open()` to handle the three `CFileSource` cases as listed above.
2. Add a unit test which seg faults without the change in (1) and passes with the change.
### Are these changes tested?
Yes.
### Are there any user-facing changes?
Yes; this PR fixes a user facing bug in the `FileFragment` API.
* GitHub Issue: #47301
Authored-by: Lester Fan <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
1 parent f8b20f1 commit d9f36a1
File tree
3 files changed
+18
-4
lines changed- python/pyarrow
- includes
- tests
3 files changed
+18
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2012 | 2012 | | |
2013 | 2013 | | |
2014 | 2014 | | |
| 2015 | + | |
2015 | 2016 | | |
2016 | 2017 | | |
2017 | 2018 | | |
2018 | | - | |
2019 | | - | |
2020 | | - | |
2021 | | - | |
| 2019 | + | |
| 2020 | + | |
| 2021 | + | |
| 2022 | + | |
| 2023 | + | |
| 2024 | + | |
| 2025 | + | |
| 2026 | + | |
2022 | 2027 | | |
2023 | 2028 | | |
2024 | 2029 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
| 185 | + | |
185 | 186 | | |
186 | 187 | | |
187 | 188 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1364 | 1364 | | |
1365 | 1365 | | |
1366 | 1366 | | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
1367 | 1375 | | |
1368 | 1376 | | |
1369 | 1377 | | |
| |||
0 commit comments