Skip to content

Support user-provided schema for Parquet reads#4134

Merged
sfc-gh-mayliu merged 1 commit intomainfrom
parquet_fix
Mar 25, 2026
Merged

Support user-provided schema for Parquet reads#4134
sfc-gh-mayliu merged 1 commit intomainfrom
parquet_fix

Conversation

@sfc-gh-dyadav
Copy link
Contributor

@sfc-gh-dyadav sfc-gh-dyadav commented Mar 24, 2026

Allow users to specify a custom schema when reading Parquet files via session.read.schema(schema).parquet(path). Previously only JSON and XML supported user-provided schemas; Parquet was blocked by a ValueError gate.

  • Add "parquet" to the user-schema format allowlist in _read_semi_structured_file
  • Update schema() docstring to reflect all supported formats
  • Remove stale ValueError assertion in test_read_parquet_with_no_schema
  • Add test_read_parquet_user_input_schema covering matching schema, missing columns, extra columns, and wrong-type error cases

Made-with: Cursor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-3242298

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

Copy link
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please also update changelog before merge, thanks!

Allow users to specify a custom schema when reading Parquet files via
`session.read.schema(schema).parquet(path)`. Previously only JSON and
XML supported user-provided schemas; Parquet was blocked by a
ValueError gate.

- Add "parquet" to the user-schema format allowlist in
  _read_semi_structured_file
- Update schema() docstring to reflect all supported formats
- Remove stale ValueError assertion in test_read_parquet_with_no_schema
- Add test_read_parquet_user_input_schema covering matching schema,
  missing columns, extra columns, and wrong-type error cases
  (parametrized for both select and copy paths)
- Update CHANGELOG.md

Made-with: Cursor
@sfc-gh-mayliu sfc-gh-mayliu merged commit 48255d9 into main Mar 25, 2026
46 of 49 checks passed
@sfc-gh-mayliu sfc-gh-mayliu deleted the parquet_fix branch March 25, 2026 22:06
@github-actions github-actions bot locked and limited conversation to collaborators Mar 25, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants