Skip to content

SNOW-1885815: Add support for seed argument in DataFrame.stat.sample_by#2925

Merged
sfc-gh-jdu merged 5 commits intomainfrom
jdu-SNOW-1885815-sample-by-seed
Jan 30, 2025
Merged

SNOW-1885815: Add support for seed argument in DataFrame.stat.sample_by#2925
sfc-gh-jdu merged 5 commits intomainfrom
jdu-SNOW-1885815-sample-by-seed

Conversation

@sfc-gh-jdu
Copy link
Collaborator

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1885815

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    Note it only supports Table object

CHANGELOG.md Outdated
- `try_to_binary`
- Added support for specifying a schema string (including implicit struct syntax) when calling `DataFrame.create_dataframe`.
- Added support for `DataFrameWriter.insert_into/insertInto`. This method also supports local testing mode.
- Added support for `seed` argument in `DataFrame.stat.sample_by`. Note that it only supports a `Table` object, and will be ignored for a `DataFrame` object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about logging a warning when the DataFrame isn't a Table?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's already added

Comment on lines +393 to +394
Default value is ``None``. This parameter is only supported for :class:`Table`, and it will be ignored
if it is specified for :class`DataFrame`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious to know why we choose to ignore instead of failing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SAS team to work around by creating a temp table first. See https://snowflakecomputing.atlassian.net/browse/SNOW-1894684

col: The name of the column that defines the strata.
fractions: A ``dict`` that specifies the fraction to use for the sample for each stratum.
If a stratum is not specified in the ``dict``, the method uses 0 as the fraction.
seed: Specifies a seed value to make the sampling deterministic. Can be any integer between 0 and 2147483647 inclusive.
Copy link
Contributor

@sfc-gh-aling sfc-gh-aling Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen if seed is not in the accepted range?
a quick follow-up is do we need client validation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will raise a SQL error. I think we don't need to do it right now, as DataFrame.sample also doesn't do it.

@sfc-gh-jdu sfc-gh-jdu merged commit dfbb0eb into main Jan 30, 2025
32 of 36 checks passed
@sfc-gh-jdu sfc-gh-jdu deleted the jdu-SNOW-1885815-sample-by-seed branch January 30, 2025 18:54
@github-actions github-actions bot locked and limited conversation to collaborators Jan 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants