Skip to content

Commit fca044f

Browse files
[Edit] Python: Matplotlib: .boxplot() (#7300)
* [Edit] Python: Matplotlib: .boxplot() * Add files via upload * added keywords * applied suggestions * Update content/matplotlib/concepts/pyplot/terms/boxplot/boxplot.md * Update content/matplotlib/concepts/pyplot/terms/boxplot/boxplot.md * Update content/matplotlib/concepts/pyplot/terms/boxplot/boxplot.md * Update content/matplotlib/concepts/pyplot/terms/boxplot/boxplot.md * Update content/matplotlib/concepts/pyplot/terms/boxplot/boxplot.md * Update content/matplotlib/concepts/pyplot/terms/boxplot/boxplot.md * Update content/matplotlib/concepts/pyplot/terms/boxplot/boxplot.md ---------
1 parent 2aec72f commit fca044f

File tree

4 files changed

+127
-42
lines changed

4 files changed

+127
-42
lines changed
Lines changed: 127 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,86 +1,171 @@
11
---
22
Title: '.boxplot()'
3-
Description: 'Returns a box and whisker plot.'
3+
Description: 'Creates box-and-whisker plots to display statistical summaries of datasets.'
44
Subjects:
55
- 'Data Science'
66
- 'Data Visualization'
77
Tags:
8-
- 'Graphs'
9-
- 'Libraries'
8+
- 'Charts'
109
- 'Matplotlib'
10+
- 'Statistics'
1111
CatalogContent:
1212
- 'learn-python-3'
13-
- 'paths/computer-science'
13+
- 'paths/data-science'
1414
---
1515

16-
The **`.boxplot()`** is a method in the Matplotlib library that returns a box and whisker plot based on one or more arrays of data as input.
16+
The **matplotlib's `.boxplot()`** method is a powerful data visualization function in matplotlib's [`pyplot`](https://www.codecademy.com/resources/docs/matplotlib/pyplot) module that creates box-and-whisker plots to display the statistical summary of a dataset. This method displays the distribution of data through quartiles, showing the median, first quartile (Q1), third quartile (Q3), and potential outliers in a compact visual format.
1717

1818
## Syntax
1919

2020
```pseudo
21-
matplotlib.pyplot.boxplot(x, notch, sym, vert, whis, bootstrap, usermedians, conf_intervals, positions, widths, patch_artist, labels, manage_ticks, autorange, meanline, zorder )
21+
matplotlib.pyplot.boxplot(x, notch=None, sym=None, vert=None, ...)
2222
```
2323

24-
The `x` parameter is required, and represents an array or a sequence of vectors. Other parameters are optional and used to modify the features of the boxplot.
24+
> **Note:** The ellipses (`...`) indicate that there are many additional optional parameters available, such as `widths`, `patch_artist`, `showmeans`, `boxprops`, and others. These parameters provide detailed control over the style, layout, and display of the boxplot.
2525
26-
`.boxplot()` takes the following arguments:
26+
**Parameters:**
2727

28-
- `x` : Takes in the data to be plotted as a list, array, or a sequence of arrays. Each array represents a different dataset to be plotted in the boxplot.
29-
- `notch`: If `True`, a notch is drawn around the median to represent the median uncertainty.
30-
- `sym`: A parameter is used to modify the designation of outliers. By default, outliers are represented as dots, if an empty string is passed any outliers in the data will not be visible in the plot.
31-
- `vert`: If `True`, the boxplot is drawn vertically (default). If `False`, it is drawn horizontally.
32-
- `whis`: This parameter is used to specify the whisker length as a multiple of the IQR. The default is 1.5, which is the standard length.
33-
- `bootstrap`: Specifies whether to bootstrap the confidence intervals around the median for notched boxplots.
34-
- `usermedians`: This parameter is used to pass in a sequence of medians to be used for each dataset.
35-
- `conf_intervals`: If `True`, the confidence intervals around the median are drawn as notches.
36-
- `positions`: This parameter is used to specify the positions of the boxes in the plot.
37-
- `widths`: This parameter is used to specify the width of the boxes.
38-
- `patch_artist`: If `True`, the boxes will be filled with color.
39-
- `labels`: This parameter is used to pass in a list of labels to be used for each dataset.
40-
- `meanline`: If `True`, a line is drawn at the mean value of each dataset.
41-
- `zorder`: This parameter is used to specify the z-order of the plot. By default, the boxplot is drawn on top of other plot elements.
28+
- `x`: The input data (array-like or sequence of arrays). Can be a 1D array for a single boxplot or a sequence of arrays for multiple boxplots.
29+
- `notch`: Boolean, optional. If True, a notched boxplot is created to indicate confidence intervals around the median.
30+
- `sym`: String, optional. Default symbol for outlier points. An empty string hides the outliers.
31+
- `vert`: Boolean, optional. If True (default), plots boxes vertically. If False, plots horizontally.
4232

43-
## Examples
33+
**Return value:**
4434

45-
Below are the examples demonstrating the use of `.boxplot()`.
35+
The method returns a [dictionary](https://www.codecademy.com/resources/docs/python/dictionaries) containing the matplotlib artists used in the boxplot. The dictionary includes keys for 'boxes', 'medians', 'whiskers', 'caps', 'fliers', and 'means'.
36+
37+
## Example 1: Creating a Basic Boxplot using `matplotlib.pyplot.boxplot()`
38+
39+
This example demonstrates how to create a simple boxplot using randomly generated data:
4640

4741
```py
4842
import matplotlib.pyplot as plt
4943
import numpy as np
5044

51-
# Generate some random data
52-
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
45+
# Set random seed for reproducibility
46+
np.random.seed(42)
5347

54-
# Create a box and whisker plot
55-
plt.boxplot(data)
48+
# Generate sample data
49+
data = np.random.normal(100, 15, 200)
5650

57-
# Show the plot
51+
# Create the boxplot
52+
plt.figure(figsize=(8, 6))
53+
plt.boxplot(data)
54+
plt.title('Basic Boxplot Example')
55+
plt.ylabel('Values')
5856
plt.show()
5957
```
6058

61-
Output:
59+
The output of this code is:
60+
61+
![A simple matplotlib boxplot showing the distribution of normally distributed data with a median line, quartile box, whiskers, and outlier points](https://raw.githubusercontent.com/Codecademy/docs/main/media/boxplot1.png)
62+
63+
The code generates a dataset with 200 values following a normal distribution with a mean of 100 and a standard deviation of 15. The resulting boxplot displays the median as a horizontal line, the box representing the interquartile range (IQR), whiskers extending to the most extreme non-outlier data points, and any outliers as individual points.
6264

63-
![Output of matplotlib.pyplot.boxplot() method example 1](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplotlib-boxplot-example-1.png)
65+
## Example 2: Multiple Dataset Comparison using the `matplotlib.pyplot.boxplot()` method
66+
67+
This example shows how to create boxplots for multiple datasets to compare their distributions:
6468

6569
```py
6670
import matplotlib.pyplot as plt
6771
import numpy as np
6872

69-
# Generate some random data
70-
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
73+
# Set random seed for reproducibility
74+
np.random.seed(42)
75+
76+
# Generate multiple datasets with different characteristics
77+
dataset1 = np.random.normal(80, 10, 100) # Lower mean, smaller spread
78+
dataset2 = np.random.normal(100, 20, 100) # Higher mean, larger spread
79+
dataset3 = np.random.exponential(25, 100) # Exponential distribution
80+
dataset4 = np.random.uniform(50, 150, 100) # Uniform distribution
81+
82+
# Combine datasets
83+
data = [dataset1, dataset2, dataset3, dataset4]
84+
85+
# Create multiple boxplots
86+
plt.figure(figsize=(10, 6))
87+
box_plot = plt.boxplot(data, labels=['Normal (80,10)', 'Normal (100,20)',
88+
'Exponential (25)', 'Uniform (50,150)'])
89+
plt.title('Comparison of Different Distributions')
90+
plt.ylabel('Values')
91+
plt.xlabel('Distribution Type')
92+
plt.xticks(rotation=45)
93+
plt.tight_layout()
94+
plt.show()
95+
```
96+
97+
The output of this code is:
98+
99+
![Four side-by-side matplotlib boxplots comparing normal, exponential, and uniform distributions with different means and spreads](https://raw.githubusercontent.com/Codecademy/docs/main/media/boxplot2.png)
100+
101+
This example creates four different datasets with distinct statistical properties and displays them side by side. The boxplots make it easy to compare the medians, spreads, and presence of outliers across the different distributions.
71102

72-
# Create a box and whisker plot with some custom parameters
73-
plt.boxplot(data, notch=True, sym='g+', vert=False, whis=0.75, bootstrap=10000, usermedians=[np.mean(d) for d in data], conf_intervals=None, patch_artist=True)
103+
## Example 3: Customized Sales Performance Analysis on Boxplot
74104

75-
# Add labels and title
76-
plt.xlabel('Value')
77-
plt.ylabel('Group')
78-
plt.title('Customized box and whisker plot')
105+
This example demonstrates a real-world scenario analyzing quarterly sales performance across different product categories:
79106

80-
# Show the plot
107+
```py
108+
import matplotlib.pyplot as plt
109+
import numpy as np
110+
111+
# Set random seed for reproducibility
112+
np.random.seed(42)
113+
114+
# Simulate quarterly sales data (in thousands)
115+
electronics = np.random.normal(150, 25, 50) # Electronics sales
116+
clothing = np.random.normal(120, 30, 50) # Clothing sales
117+
home_goods = np.random.normal(100, 20, 50) # Home goods sales
118+
sports = np.random.normal(80, 15, 50) # Sports equipment sales
119+
120+
# Add some outliers to make it more realistic
121+
electronics = np.append(electronics, [220, 250]) # High-performance months
122+
clothing = np.append(clothing, [200, 40]) # Seasonal variations
123+
home_goods = np.append(home_goods, [180]) # Holiday boost
124+
sports = np.append(sports, [150, 30]) # Seasonal impact
125+
126+
# Combine all sales data
127+
sales_data = [electronics, clothing, home_goods, sports]
128+
categories = ['Electronics', 'Clothing', 'Home Goods', 'Sports']
129+
130+
# Create customized boxplot
131+
plt.figure(figsize=(12, 8))
132+
box_plot = plt.boxplot(sales_data,
133+
labels=categories,
134+
patch_artist=True, # Fill with colors
135+
notch=True, # Show confidence intervals
136+
showmeans=True) # Show mean values
137+
138+
# Customize colors for each category
139+
colors = ['lightblue', 'lightgreen', 'lightcoral', 'lightyellow']
140+
for patch, color in zip(box_plot['boxes'], colors):
141+
patch.set_facecolor(color)
142+
143+
# Customize the plot appearance
144+
plt.title('Quarterly Sales Performance Analysis by Product Category',
145+
fontsize=16, fontweight='bold')
146+
plt.ylabel('Sales (in thousands USD)', fontsize=12)
147+
plt.xlabel('Product Categories', fontsize=12)
148+
plt.grid(axis='y', alpha=0.3)
149+
plt.tight_layout()
81150
plt.show()
82151
```
83152

84-
Output:
153+
The output of this code is:
154+
155+
![Colorful customized matplotlib boxplots showing quarterly sales performance across four product categories, with notches and mean indicators](https://raw.githubusercontent.com/Codecademy/docs/main/media/boxplot3.png)
156+
157+
This example simulates a business scenario where sales data is analyzed across different product categories. The customized boxplot uses colors to distinguish categories, shows confidence intervals through notches, and displays mean values alongside medians. This visualization helps identify which product categories perform best and have the most consistent sales patterns.
158+
159+
## Frequently Asked Questions
160+
161+
### 1. What is a box plot in Matplotlib?
162+
163+
A box plot displays data distribution through five statistics: minimum, Q1, median, Q3, and maximum, with outliers shown as individual points.
164+
165+
### 2. What is the difference between Seaborn Boxplot and Matplotlib Boxplot?
166+
167+
Seaborn's boxplot offers better default styling and easier categorical data handling, while Matplotlib's boxplot provides more low-level control and customization options.
168+
169+
### 3. How to plot a boxplot in a Python `DataFrame`?
85170

86-
![Output of matplotlib.pyplot.boxplot() method example 2](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplotlib-boxplot-example-2.png)
171+
Pass DataFrame columns to `plt.boxplot([df['col1'], df['col2']])` or use pandas' built-in `df.boxplot()` method.

media/boxplot1.png

15 KB
Loading

media/boxplot2.png

36.2 KB
Loading

media/boxplot3.png

48.2 KB
Loading

0 commit comments

Comments
 (0)