-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Summary
Windows build fails to parse an invalid URL due to presence of tilde in prepared file path.
Initial Bug:
Below is initial bug finding:
Windows build is failing after recent changes to read parquet files directly rather than convert them to batches.
https://github.com/kaskada-ai/kaskada/actions/runs/6423963739/job/17443601949
---------------------------- Captured stderr call -----------------------------
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: \x1b[1mfailed to prepare batch\x1b[22m\n\u251c\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-session\\src\\table.rs:153:14\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1minternal error\x1b[22m\n\u2502 \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\prepare\\preparer.rs:133:10\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1mfailed to create Parquet file reader\x1b[22m\n\u2502 \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\prepare.rs:52:22\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1minvalid parquet file metadata\x1b[22m\n\u2502 \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\read\\parquet_file.rs:63:18\x1b[23m\n\u2502\n\u2570\u2500\u25b6 \x1b[1mGeneric LocalFileSystem error: Unable to access metadata for D:/a/kaskada/kaskada/python/D:/a/kaskada/kaskada/testdata/purchases/purchases_part1.parquet: The filename, directory name, or volume label syntax is incorrect. (os error 123)\x1b[22m\n \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\read\\parquet_file.rs:62:18\x1b[23m', src\\table.rs:94:24\nnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
_______________________ test_read_parquet_with_subsort ________________________
golden = <conftest.GoldenFixture object at 0x0000018C117B2910>
async def test_read_parquet_with_subsort(golden) -> None:
> source = await kd.sources.Parquet.create(
"../testdata/purchases/purchases_part1.parquet",
time_column="purchase_time",
key_column="customer_id",
subsort_column="subsort_id",
)
pytests\parquet_source_test.py:17:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv\Lib\site-packages\kaskada\sources\arrow.py:582: in create
await source.add_file(path)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <kaskada.sources.arrow.Parquet object at 0x0000018C117A42D0>
path = 'D:\\a\\kaskada\\kaskada\\python/../testdata/purchases/purchases_part1.parquet'
async def add_file(self, path: str) -> None:
"""Add data to the source."""
> await self._ffi_table.add_parquet(str(Source._get_absolute_path(path)))
E pyo3_asyncio.RustPanic: rust future panicked
.venv\Lib\site-packages\kaskada\sources\arrow.py:587: RustPanic
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels