Skip to content

Commit 58e0b05

Browse files
authored
Modernize boxenplot and address some odd behaviors (#3393)
* WIP port over some of the boxenplot stats computation to its own object * POC new boxenplot * Add generic kwargs for boxenplot * Shift catplot over to new boxen code * Add basic boxenplot test setup * Basic categorical tests for boxenplot * Add some boxenplot-specific tests * Fix tests * Add docs and tests for LetterValue statistics * Remove vestigial warning assertions * Add shared test for hue_order and remove vestigial test * Bump minimal matplotlib * Fix matplotlib version guard in Plot test * Update boxenplot docs * Rename scale -> width_method and add tests * Update boxenplot API examples * Use a lighter default color for outliers * Force boxes to cover all datapoints with k_depth='full' * Remove vestigial boxenplot components and tests * Remove vestigial catplot internals
1 parent 7780305 commit 58e0b05

File tree

8 files changed

+1274
-1209
lines changed

8 files changed

+1274
-1209
lines changed

ci/deps_pinned.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
numpy~=1.20.0
22
pandas~=1.2.0
3-
matplotlib~=3.3.0
3+
matplotlib~=3.4.0
44
scipy~=1.7.0
55
statsmodels~=0.12.0

doc/_docstrings/boxenplot.ipynb

Lines changed: 215 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,103 +5,286 @@
55
"execution_count": null,
66
"id": "882d215b-88d8-4b5e-ae7a-0e3f6bb53bad",
77
"metadata": {
8+
"editable": true,
9+
"slideshow": {
10+
"slide_type": ""
11+
},
812
"tags": [
913
"hide"
1014
]
1115
},
1216
"outputs": [],
1317
"source": [
1418
"import seaborn as sns\n",
15-
"sns.set_theme(style=\"whitegrid\")"
19+
"sns.set_theme(style=\"whitegrid\")\n",
20+
"diamonds = sns.load_dataset(\"diamonds\")"
21+
]
22+
},
23+
{
24+
"cell_type": "raw",
25+
"id": "9b8b892e-a96f-46e8-9c5e-8749783608d8",
26+
"metadata": {
27+
"editable": true,
28+
"raw_mimetype": "",
29+
"slideshow": {
30+
"slide_type": ""
31+
},
32+
"tags": []
33+
},
34+
"source": [
35+
"Draw a single horizontal plot, assigning the data directly to the coordinate variable:"
1636
]
1737
},
1838
{
1939
"cell_type": "code",
2040
"execution_count": null,
21-
"id": "6809326c-14a9-4314-994d-b4e8e7414172",
22-
"metadata": {},
41+
"id": "391e1162-b438-4486-9a08-60686ee8e96a",
42+
"metadata": {
43+
"editable": true,
44+
"slideshow": {
45+
"slide_type": ""
46+
},
47+
"tags": []
48+
},
2349
"outputs": [],
2450
"source": [
25-
"df = sns.load_dataset(\"diamonds\")"
51+
"sns.boxenplot(x=diamonds[\"price\"])"
2652
]
2753
},
2854
{
29-
"cell_type": "markdown",
30-
"id": "9ccbc2d5-5a44-4e80-8b07-e12629729f4a",
31-
"metadata": {},
55+
"cell_type": "raw",
56+
"id": "b0c5a469-c709-4333-a8bc-b2cb34f366aa",
57+
"metadata": {
58+
"editable": true,
59+
"raw_mimetype": "",
60+
"slideshow": {
61+
"slide_type": ""
62+
},
63+
"tags": []
64+
},
3265
"source": [
33-
"Draw a single horizontal plot, assigning the data directly to the coordinate variable:"
66+
"Group by a categorical variable, referencing columns in a datafame"
3467
]
3568
},
3669
{
3770
"cell_type": "code",
3871
"execution_count": null,
39-
"id": "391e1162-b438-4486-9a08-60686ee8e96a",
40-
"metadata": {},
72+
"id": "e30fec18-f127-40a3-bfaf-f71324dd60ec",
73+
"metadata": {
74+
"editable": true,
75+
"slideshow": {
76+
"slide_type": ""
77+
},
78+
"tags": []
79+
},
4180
"outputs": [],
4281
"source": [
43-
"sns.boxenplot(x=df[\"price\"])"
82+
"sns.boxenplot(data=diamonds, x=\"price\", y=\"clarity\")"
4483
]
4584
},
4685
{
47-
"cell_type": "markdown",
48-
"id": "a3b0e9b8-1673-494c-a27a-aa9c60457ba1",
49-
"metadata": {},
86+
"cell_type": "raw",
87+
"id": "70fe999a-bea5-4b0a-a1a3-474b6696d1be",
88+
"metadata": {
89+
"editable": true,
90+
"raw_mimetype": "",
91+
"slideshow": {
92+
"slide_type": ""
93+
},
94+
"tags": []
95+
},
5096
"source": [
51-
"Group by a categorical variable, referencing columns in a datafame"
97+
"Group by another variable, representing it by the color of the boxes. By default, each boxen plot will be \"dodged\" so that they don't overlap; you can also add a small gap between them:"
5298
]
5399
},
54100
{
55101
"cell_type": "code",
56102
"execution_count": null,
57-
"id": "e30fec18-f127-40a3-bfaf-f71324dd60ec",
58-
"metadata": {},
103+
"id": "eed3239c-57b7-4d76-9fdc-be99257047fd",
104+
"metadata": {
105+
"editable": true,
106+
"slideshow": {
107+
"slide_type": ""
108+
},
109+
"tags": []
110+
},
59111
"outputs": [],
60112
"source": [
61-
"sns.boxenplot(data=df, x=\"price\", y=\"clarity\")"
113+
"large_diamond = diamonds[\"carat\"].gt(1).rename(\"large_diamond\")\n",
114+
"sns.boxenplot(data=diamonds, x=\"price\", y=\"clarity\", hue=large_diamond, gap=.2)"
62115
]
63116
},
64117
{
65-
"cell_type": "markdown",
66-
"id": "4f01a821-74d1-452d-a1f7-cf5b806169e8",
67-
"metadata": {},
118+
"cell_type": "raw",
119+
"id": "36030c1c-047b-4f7b-b366-91188b41680e",
120+
"metadata": {
121+
"editable": true,
122+
"raw_mimetype": "",
123+
"slideshow": {
124+
"slide_type": ""
125+
},
126+
"tags": []
127+
},
68128
"source": [
69-
"Use a different scaling rule to control the width of each box:"
129+
"The default rule for choosing each box width represents the percentile covered by the box. Alternatively, you can reduce each box width by a linear factor:"
70130
]
71131
},
72132
{
73133
"cell_type": "code",
74134
"execution_count": null,
75135
"id": "d0c1aa43-5e8a-486c-bd6d-3c29d6d23138",
76-
"metadata": {},
136+
"metadata": {
137+
"editable": true,
138+
"slideshow": {
139+
"slide_type": ""
140+
},
141+
"tags": []
142+
},
77143
"outputs": [],
78144
"source": [
79-
"sns.boxenplot(data=df, x=\"carat\", y=\"cut\", scale=\"linear\")"
145+
"sns.boxenplot(data=diamonds, x=\"price\", y=\"clarity\", width_method=\"linear\")"
80146
]
81147
},
82148
{
83-
"cell_type": "markdown",
84-
"id": "fd5d197c-8cbb-4be3-a14d-76447f06d3f1",
85-
"metadata": {},
149+
"cell_type": "raw",
150+
"id": "062a9fc2-9cbe-4e40-af8c-3fd35f785cd5",
151+
"metadata": {
152+
"editable": true,
153+
"raw_mimetype": "",
154+
"slideshow": {
155+
"slide_type": ""
156+
},
157+
"tags": []
158+
},
86159
"source": [
87-
"Use a different method to determine the number of boxes:"
160+
"The `width` parameter itself, on the other hand, determines the width of the largest box:"
161+
]
162+
},
163+
{
164+
"cell_type": "code",
165+
"execution_count": null,
166+
"id": "4100a460-fe27-42b7-bbaf-4430a1c1359f",
167+
"metadata": {
168+
"editable": true,
169+
"slideshow": {
170+
"slide_type": ""
171+
},
172+
"tags": []
173+
},
174+
"outputs": [],
175+
"source": [
176+
"sns.boxenplot(data=diamonds, x=\"price\", y=\"clarity\", width=.5)"
177+
]
178+
},
179+
{
180+
"cell_type": "raw",
181+
"id": "407874a8-1202-4bcc-9f65-59e1fed29e07",
182+
"metadata": {
183+
"editable": true,
184+
"raw_mimetype": "",
185+
"slideshow": {
186+
"slide_type": ""
187+
},
188+
"tags": []
189+
},
190+
"source": [
191+
"There are several different approaches for choosing the number of boxes to draw, including a rule based on the confidence level of the percentie estimate:"
88192
]
89193
},
90194
{
91195
"cell_type": "code",
92196
"execution_count": null,
93197
"id": "1aead6a3-6f12-47d3-b472-a39c61867963",
94-
"metadata": {},
198+
"metadata": {
199+
"editable": true,
200+
"slideshow": {
201+
"slide_type": ""
202+
},
203+
"tags": []
204+
},
95205
"outputs": [],
96206
"source": [
97-
"sns.boxenplot(data=df, x=\"carat\", y=\"cut\", k_depth=\"trustworthy\")"
207+
"sns.boxenplot(data=diamonds, x=\"price\", y=\"clarity\", k_depth=\"trustworthy\", trust_alpha=0.01)"
208+
]
209+
},
210+
{
211+
"cell_type": "raw",
212+
"id": "71212196-d60e-4682-8dcb-0289956be152",
213+
"metadata": {
214+
"editable": true,
215+
"raw_mimetype": "",
216+
"slideshow": {
217+
"slide_type": ""
218+
},
219+
"tags": []
220+
},
221+
"source": [
222+
"The `linecolor` and `linewidth` parameters control the outlines of the boxes, while the `line_kws` parameter controls the line representing the median and the `flier_kws` parameter controls the appearance of the outliers:"
98223
]
99224
},
100225
{
101226
"cell_type": "code",
102227
"execution_count": null,
103-
"id": "719fd61f-9795-47d6-96bd-4929d8647038",
104-
"metadata": {},
228+
"id": "dd103426-a99f-476b-ae29-a11d52958cdb",
229+
"metadata": {
230+
"editable": true,
231+
"slideshow": {
232+
"slide_type": ""
233+
},
234+
"tags": []
235+
},
236+
"outputs": [],
237+
"source": [
238+
"sns.boxenplot(\n",
239+
" data=diamonds, x=\"price\", y=\"clarity\",\n",
240+
" linewidth=.5, linecolor=\".7\",\n",
241+
" line_kws=dict(linewidth=1.5, color=\"#cde\"),\n",
242+
" flier_kws=dict(facecolor=\".7\", linewidth=.5),\n",
243+
")"
244+
]
245+
},
246+
{
247+
"cell_type": "raw",
248+
"id": "16f1c534-3316-4752-ae12-f65dee9275cb",
249+
"metadata": {
250+
"editable": true,
251+
"raw_mimetype": "",
252+
"slideshow": {
253+
"slide_type": ""
254+
},
255+
"tags": []
256+
},
257+
"source": [
258+
"It is also possible to draw unfilled boxes. With unfilled boxes, all elements will be drawn as line art and follow `hue`, when used:"
259+
]
260+
},
261+
{
262+
"cell_type": "code",
263+
"execution_count": null,
264+
"id": "ab6aef09-5bbe-4c01-b6ba-05446982d775",
265+
"metadata": {
266+
"editable": true,
267+
"slideshow": {
268+
"slide_type": ""
269+
},
270+
"tags": []
271+
},
272+
"outputs": [],
273+
"source": [
274+
"sns.boxenplot(data=diamonds, x=\"price\", y=\"clarity\", hue=\"clarity\", fill=False)"
275+
]
276+
},
277+
{
278+
"cell_type": "code",
279+
"execution_count": null,
280+
"id": "e059b944-ea59-408d-87bb-4ce65074dab5",
281+
"metadata": {
282+
"editable": true,
283+
"slideshow": {
284+
"slide_type": ""
285+
},
286+
"tags": []
287+
},
105288
"outputs": [],
106289
"source": []
107290
}

doc/_docstrings/violinplot.ipynb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@
55
"execution_count": null,
66
"id": "cc19031c-bc2f-4294-95ce-3a2d9b86f44d",
77
"metadata": {
8+
"editable": true,
9+
"slideshow": {
10+
"slide_type": ""
11+
},
812
"tags": [
913
"hide"
1014
]

0 commit comments

Comments
 (0)