SNOW-1885815: Add support for seed argument in DataFrame.stat.sample_by#2925
Merged
sfc-gh-jdu merged 5 commits intomainfrom Jan 30, 2025
Merged
SNOW-1885815: Add support for seed argument in DataFrame.stat.sample_by#2925sfc-gh-jdu merged 5 commits intomainfrom
seed argument in DataFrame.stat.sample_by#2925sfc-gh-jdu merged 5 commits intomainfrom
Conversation
CHANGELOG.md
Outdated
| - `try_to_binary` | ||
| - Added support for specifying a schema string (including implicit struct syntax) when calling `DataFrame.create_dataframe`. | ||
| - Added support for `DataFrameWriter.insert_into/insertInto`. This method also supports local testing mode. | ||
| - Added support for `seed` argument in `DataFrame.stat.sample_by`. Note that it only supports a `Table` object, and will be ignored for a `DataFrame` object. |
Collaborator
There was a problem hiding this comment.
How about logging a warning when the DataFrame isn't a Table?
Collaborator
Author
There was a problem hiding this comment.
Yea it's already added
sfc-gh-aalam
approved these changes
Jan 24, 2025
Comment on lines
+393
to
+394
| Default value is ``None``. This parameter is only supported for :class:`Table`, and it will be ignored | ||
| if it is specified for :class`DataFrame`. |
Contributor
There was a problem hiding this comment.
curious to know why we choose to ignore instead of failing?
Collaborator
Author
There was a problem hiding this comment.
For SAS team to work around by creating a temp table first. See https://snowflakecomputing.atlassian.net/browse/SNOW-1894684
| col: The name of the column that defines the strata. | ||
| fractions: A ``dict`` that specifies the fraction to use for the sample for each stratum. | ||
| If a stratum is not specified in the ``dict``, the method uses 0 as the fraction. | ||
| seed: Specifies a seed value to make the sampling deterministic. Can be any integer between 0 and 2147483647 inclusive. |
Contributor
There was a problem hiding this comment.
what would happen if seed is not in the accepted range?
a quick follow-up is do we need client validation?
Collaborator
Author
There was a problem hiding this comment.
It will raise a SQL error. I think we don't need to do it right now, as DataFrame.sample also doesn't do it.
sfc-gh-aling
approved these changes
Jan 30, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1885815
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Note it only supports
Tableobject