Skip to content

bug: Windows build fails to parse invalid URL in prepared file #794

@jordanrfrazier

Description

@jordanrfrazier

Summary

Windows build fails to parse an invalid URL due to presence of tilde in prepared file path.

Initial Bug:

Below is initial bug finding:

Windows build is failing after recent changes to read parquet files directly rather than convert them to batches.

https://github.com/kaskada-ai/kaskada/actions/runs/6423963739/job/17443601949

   ---------------------------- Captured stderr call -----------------------------
  thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: \x1b[1mfailed to prepare batch\x1b[22m\n\u251c\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-session\\src\\table.rs:153:14\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1minternal error\x1b[22m\n\u2502   \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\prepare\\preparer.rs:133:10\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1mfailed to create Parquet file reader\x1b[22m\n\u2502   \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\prepare.rs:52:22\x1b[23m\n\u2502\n\u251c\u2500\u25b6 \x1b[1minvalid parquet file metadata\x1b[22m\n\u2502   \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\read\\parquet_file.rs:63:18\x1b[23m\n\u2502\n\u2570\u2500\u25b6 \x1b[1mGeneric LocalFileSystem error: Unable to access metadata for D:/a/kaskada/kaskada/python/D:/a/kaskada/kaskada/testdata/purchases/purchases_part1.parquet: The filename, directory name, or volume label syntax is incorrect. (os error 123)\x1b[22m\n    \u2570\u2574at \x1b[3mD:\\a\\kaskada\\kaskada\\crates\\sparrow-runtime\\src\\read\\parquet_file.rs:62:18\x1b[23m', src\\table.rs:94:24\nnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
  _______________________ test_read_parquet_with_subsort ________________________
  
  golden = <conftest.GoldenFixture object at 0x0000018C117B2910>
  
      async def test_read_parquet_with_subsort(golden) -> None:
  >       source = await kd.sources.Parquet.create(
              "../testdata/purchases/purchases_part1.parquet",
              time_column="purchase_time",
              key_column="customer_id",
              subsort_column="subsort_id",
          )
  
  pytests\parquet_source_test.py:17: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  .venv\Lib\site-packages\kaskada\sources\arrow.py:582: in create
      await source.add_file(path)
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  
  self = <kaskada.sources.arrow.Parquet object at 0x0000018C117A42D0>
  path = 'D:\\a\\kaskada\\kaskada\\python/../testdata/purchases/purchases_part1.parquet'
  
      async def add_file(self, path: str) -> None:
          """Add data to the source."""
  >       await self._ffi_table.add_parquet(str(Source._get_absolute_path(path)))
  E       pyo3_asyncio.RustPanic: rust future panicked
  
  .venv\Lib\site-packages\kaskada\sources\arrow.py:587: RustPanic

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions