-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
DOC: More examples comparison with sql #12932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
e6fea8f
e796b85
b42a7a2
8dd8724
b32738b
56d0494
f64b0e6
6a4522c
e554cd7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -372,10 +372,98 @@ In pandas, you can use :meth:`~pandas.concat` in conjunction with | |
|
||
pd.concat([df1, df2]).drop_duplicates() | ||
|
||
Pandas equivalents for some SQL analytic and aggregate functions | ||
---------------------------------------------------------------- | ||
Top N rows with offset | ||
|
||
.. code-block:: sql | ||
|
||
-- MySQL | ||
SELECT * FROM tips | ||
ORDER BY tip DESC | ||
LIMIT 10 OFFSET 5; | ||
|
||
In pandas: | ||
|
||
.. ipython:: python | ||
|
||
tips.nlargest(10+5, columns='tip').tail(10) | ||
|
||
Top N rows per group | ||
|
||
.. code-block:: sql | ||
|
||
-- Oracle's ROW_NUMBER() analytic function | ||
SELECT * FROM ( | ||
SELECT | ||
t.*, | ||
ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rn | ||
FROM tips t | ||
) | ||
WHERE rn <= 3 | ||
ORDER BY day, rn; | ||
|
||
.. ipython:: python | ||
|
||
tips.sort_values(['total_bill'], ascending=False).groupby('sex').head(3) | ||
|
||
|
||
Let's add an `RN` (Row Number) column | ||
|
||
.. ipython:: python | ||
|
||
tips['rn'] = tips.sort_values(['total_bill'], ascending=False) \ | ||
.groupby(['day']) \ | ||
.cumcount() + 1 | ||
tips.loc[tips['rn'] < 3].sort_values(['day','rn']) | ||
|
||
the same using `rank(method='first')` function | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
.. ipython:: python | ||
|
||
tips['rnk'] = tips.groupby(['day'])['total_bill'].rank(method='first', ascending=False) | ||
|
||
tips.loc[tips['rnk'] < 3].sort_values(['day','rnk']) | ||
|
||
.. code-block:: sql | ||
|
||
-- Oracle's RANK() analytic function | ||
SELECT * FROM ( | ||
SELECT | ||
t.*, | ||
RANK() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rnk | ||
FROM tips t | ||
) | ||
WHERE rnk < 3 | ||
ORDER BY day, rn; | ||
|
||
.. ipython:: python | ||
|
||
tips['rnk_min'] = tips.groupby(['day'])['total_bill'].rank(method='min', ascending=False) | ||
tips.loc[tips['rnk_min'] < 3].sort_values(['day','rnk_min']) | ||
|
||
|
||
|
||
UPDATE | ||
------ | ||
|
||
.. code-block:: sql | ||
|
||
UPDATE tips | ||
SET tip = tip*2 | ||
WHERE tip < 2; | ||
|
||
.. ipython:: python | ||
|
||
tips.loc[tips['tip'] < 2, 'tip'] *= 2 | ||
|
||
DELETE | ||
------ | ||
|
||
.. code-block:: sql | ||
|
||
DELETE FROM tips | ||
WHERE tip > 9; | ||
|
||
In pandas we select the rows that should remain, instead of deleting them | ||
|
||
.. ipython:: python | ||
|
||
tips = tips.loc[tips['tip'] <= 9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file should be removed, but if you can't i will do on merge