-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
DOC: Improve documentation for DataFrame.__setitem__ and .loc assignment from Series #61804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
c4e1c18
e1a893d
cfa767f
699a9db
be86001
f792b39
0d938a0
ed3b173
eb9db3c
626f0ae
ed83ddf
8adb1a2
623920e
611dd9b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4214,6 +4214,90 @@ def isetitem(self, loc, value) -> None: | |
self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs) | ||
|
||
def __setitem__(self, key, value) -> None: | ||
""" | ||
Set item(s) in DataFrame by key. | ||
|
||
This method allows you to set the values of one or more columns in the | ||
niruta25 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
DataFrame using a key. The key can be a single column label, a list of | ||
labels, or a boolean array. If the key does not exist, a new | ||
column will be created. | ||
|
||
Parameters | ||
---------- | ||
key : str, list of str, or tuple | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure that the documented types are correct. Using pandas-stubs as a reference, it looks like this isn't typed at all, which probably means its more complex than documented here: I'd just remove this - I think its better to be incomplete than incorrect There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Quick question! Since parameters is required for validating doctoring, what do you suggest should be mentioned for untyped key? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm I'm not sure what the best way to workaround that is, but I'd be OK if you just removed the Parameters section entirely. Can look further into it later if needed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or instead of specifying types you can say something very generic like: key : The object(s) in the index which are to be assigned to
...
``` There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could not drop Parameter section altogether cuz doctoring validation was failing. But the second option worked. Let me know your thoughts now! |
||
Column label(s) to set. Can be a single column name, list of column names, | ||
or tuple for MultiIndex columns. | ||
value : scalar, array-like, Series, or DataFrame | ||
Value(s) to set for the specified key(s). | ||
|
||
Returns | ||
------- | ||
None | ||
This method does not return a value. | ||
|
||
See Also | ||
-------- | ||
DataFrame.loc : Access and set values by label-based indexing. | ||
DataFrame.iloc : Access and set values by position-based indexing. | ||
DataFrame.assign : Assign new columns to a DataFrame. | ||
|
||
Notes | ||
----- | ||
When assigning a Series to a DataFrame column, pandas aligns the Series | ||
by index labels, not by position. This means: | ||
|
||
* Values from the Series are matched to DataFrame rows by index label | ||
* If a Series index label doesn't exist in the DataFrame index, it's ignored | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't follow the difference between this and the line directly following it with the distinction of ignored versus NaN - can you help me understand? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Added the ignored example to documentation. |
||
* If a DataFrame index label doesn't exist in the Series index, NaN is assigned | ||
* The order of values in the Series doesn't matter; only the index labels matter | ||
|
||
Examples | ||
-------- | ||
Basic column assignment: | ||
|
||
>>> df = pd.DataFrame({"A": [1, 2, 3]}) | ||
>>> df["B"] = [4, 5, 6] # Assigns by position | ||
>>> df | ||
A B | ||
0 1 4 | ||
1 2 5 | ||
2 3 6 | ||
|
||
Series assignment with index alignment: | ||
|
||
>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2]) | ||
>>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df | ||
>>> df["B"] = s # Assigns by index label, not position | ||
>>> df | ||
A B | ||
0 1 NaN | ||
1 2 10 | ||
2 3 NaN | ||
|
||
Series assignment with partial index match: | ||
|
||
>>> df = pd.DataFrame({"A": [1, 2, 3, 4]}, index=["a", "b", "c", "d"]) | ||
>>> s = pd.Series([100, 200], index=["b", "d"]) | ||
>>> df["B"] = s | ||
>>> df | ||
A B | ||
a 1 NaN | ||
b 2 100 | ||
c 3 NaN | ||
d 4 200 | ||
|
||
Series index labels NOT in DataFrame, ignored: | ||
|
||
>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=["x", "y", "z"]) | ||
>>> s = pd.Series([10, 20, 30, 40, 50], index=["x", "y", "a", "b", "z"]) | ||
>>> df["B"] = s | ||
>>> df | ||
A B | ||
x 1 10 | ||
y 2 20 | ||
z 3 50 | ||
# Values for 'a' and 'b' are completely ignored! | ||
""" | ||
if not PYPY: | ||
if sys.getrefcount(self) <= 3: | ||
warnings.warn( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should generally encourage the tolist approach - that may have some rather big performance implications to convert each series element to a Python object.
I like what you are trying to do here, but I would just keep with the reindex approach in the second example