Skip to content

Commit 8dd8630

Browse files
DOC-8461 Reproducibility warning for random_split (#3673)
1 parent f3fe978 commit 8dd8630

File tree

1 file changed

+15
-4
lines changed

1 file changed

+15
-4
lines changed

src/snowflake/snowpark/dataframe.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1092,9 +1092,11 @@ def to_pandas(
10921092
return pandas.DataFrame(
10931093
result,
10941094
columns=[
1095-
unquote_if_quoted(attr.name)
1096-
if is_select_statement
1097-
else attr.name
1095+
(
1096+
unquote_if_quoted(attr.name)
1097+
if is_select_statement
1098+
else attr.name
1099+
)
10981100
for attr in self._plan.attributes
10991101
],
11001102
)
@@ -6273,7 +6275,9 @@ def random_split(
62736275
Every number in ``weights`` has to be positive. If only one
62746276
weight is specified, the returned DataFrame list only includes
62756277
the current DataFrame.
6276-
seed: The seed for sampling.
6278+
seed: The seed used by the randomness generator for splitting.
6279+
6280+
.. caution:: By default, reusing a seed value doesn't guarantee reproducible results.
62776281
statement_params: Dictionary of statement level parameters to be set while executing this action.
62786282
62796283
Example::
@@ -6290,6 +6294,13 @@ def random_split(
62906294
62916295
2. When a weight or a normailized weight is less than ``1e-6``, the
62926296
corresponding split dataframe will be empty.
6297+
6298+
3. To get reproducible seeding behavior, configure the DataFrame's :py:class:`Session`
6299+
to use simplified querying:
6300+
6301+
.. code-block::
6302+
6303+
>>> session.conf.set("use_simplified_query_generation", True)
62936304
"""
62946305

62956306
if not weights:

0 commit comments

Comments
 (0)