Skip to content

SNOW-2478173: Improve single-row transpose helper for 1-column frames#3975

Merged
sfc-gh-joshi merged 8 commits intomainfrom
joshi-SNOW-2478173-single-transpose-speedup
Oct 31, 2025
Merged

SNOW-2478173: Improve single-row transpose helper for 1-column frames#3975
sfc-gh-joshi merged 8 commits intomainfrom
joshi-SNOW-2478173-single-transpose-speedup

Conversation

@sfc-gh-joshi
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-2478173

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

While testing #3973, I noticed that aggregations on single-column frames/series were producing queries with JSON serialization and unnecessary UNPIVOT operations. The QC's transpose_single_row helper method is used in aggregations to skip a PIVOT operation used in the general transpose case, but for transposing a 1x1 frame, we don't even need to UNPIVOT and need only re-label the index since we already know that the column's dtype will not change.

This PR adds a fast path for 1x1 transpose_single_row operations, which replaces JSON/UNPIVOT operations with simple projections. It produces some modest performance improvements for operations on a 2000x1 frame:

  • DataFrame.count: 1.48s -> 1.31s (11.2% improvement)
  • DataFrame.describe: 2.64s -> 2.36s (10.9% improvement)
  • DataFrame.nunique: 1.25s -> 1.21s (3.4% improvement)

These improvements are likely to be more noticeable on frame produced from more complex queries.

This PR also adds explicit row count caching for the general transpose case. We currently cannot directly use the transpose_single_row path for the transpose API itself since the helper function drops the column labels of the result.

@sfc-gh-joshi sfc-gh-joshi requested a review from a team as a code owner October 29, 2025 23:45
@sfc-gh-joshi sfc-gh-joshi added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Oct 29, 2025
Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@sfc-gh-joshi sfc-gh-joshi merged commit b2696ad into main Oct 31, 2025
28 of 29 checks passed
@sfc-gh-joshi sfc-gh-joshi deleted the joshi-SNOW-2478173-single-transpose-speedup branch October 31, 2025 00:08
@github-actions github-actions bot locked and limited conversation to collaborators Oct 31, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants