|
1 | 1 | ---
|
2 | 2 | Title: '.boxplot()'
|
3 |
| -Description: 'Returns a box and whisker plot.' |
| 3 | +Description: 'Creates box-and-whisker plots to display statistical summaries of datasets.' |
4 | 4 | Subjects:
|
5 | 5 | - 'Data Science'
|
6 | 6 | - 'Data Visualization'
|
7 | 7 | Tags:
|
8 |
| - - 'Graphs' |
9 |
| - - 'Libraries' |
| 8 | + - 'Charts' |
10 | 9 | - 'Matplotlib'
|
| 10 | + - 'Statistics' |
11 | 11 | CatalogContent:
|
12 | 12 | - 'learn-python-3'
|
13 |
| - - 'paths/computer-science' |
| 13 | + - 'paths/data-science' |
14 | 14 | ---
|
15 | 15 |
|
16 |
| -The **`.boxplot()`** is a method in the Matplotlib library that returns a box and whisker plot based on one or more arrays of data as input. |
| 16 | +The **matplotlib's `.boxplot()`** method is a powerful data visualization function in matplotlib's [`pyplot`](https://www.codecademy.com/resources/docs/matplotlib/pyplot) module that creates box-and-whisker plots to display the statistical summary of a dataset. This method displays the distribution of data through quartiles, showing the median, first quartile (Q1), third quartile (Q3), and potential outliers in a compact visual format. |
17 | 17 |
|
18 | 18 | ## Syntax
|
19 | 19 |
|
20 | 20 | ```pseudo
|
21 |
| -matplotlib.pyplot.boxplot(x, notch, sym, vert, whis, bootstrap, usermedians, conf_intervals, positions, widths, patch_artist, labels, manage_ticks, autorange, meanline, zorder ) |
| 21 | +matplotlib.pyplot.boxplot(x, notch=None, sym=None, vert=None, ...) |
22 | 22 | ```
|
23 | 23 |
|
24 |
| -The `x` parameter is required, and represents an array or a sequence of vectors. Other parameters are optional and used to modify the features of the boxplot. |
| 24 | +> **Note:** The ellipses (`...`) indicate that there are many additional optional parameters available, such as `widths`, `patch_artist`, `showmeans`, `boxprops`, and others. These parameters provide detailed control over the style, layout, and display of the boxplot. |
25 | 25 |
|
26 |
| -`.boxplot()` takes the following arguments: |
| 26 | +**Parameters:** |
27 | 27 |
|
28 |
| -- `x` : Takes in the data to be plotted as a list, array, or a sequence of arrays. Each array represents a different dataset to be plotted in the boxplot. |
29 |
| -- `notch`: If `True`, a notch is drawn around the median to represent the median uncertainty. |
30 |
| -- `sym`: A parameter is used to modify the designation of outliers. By default, outliers are represented as dots, if an empty string is passed any outliers in the data will not be visible in the plot. |
31 |
| -- `vert`: If `True`, the boxplot is drawn vertically (default). If `False`, it is drawn horizontally. |
32 |
| -- `whis`: This parameter is used to specify the whisker length as a multiple of the IQR. The default is 1.5, which is the standard length. |
33 |
| -- `bootstrap`: Specifies whether to bootstrap the confidence intervals around the median for notched boxplots. |
34 |
| -- `usermedians`: This parameter is used to pass in a sequence of medians to be used for each dataset. |
35 |
| -- `conf_intervals`: If `True`, the confidence intervals around the median are drawn as notches. |
36 |
| -- `positions`: This parameter is used to specify the positions of the boxes in the plot. |
37 |
| -- `widths`: This parameter is used to specify the width of the boxes. |
38 |
| -- `patch_artist`: If `True`, the boxes will be filled with color. |
39 |
| -- `labels`: This parameter is used to pass in a list of labels to be used for each dataset. |
40 |
| -- `meanline`: If `True`, a line is drawn at the mean value of each dataset. |
41 |
| -- `zorder`: This parameter is used to specify the z-order of the plot. By default, the boxplot is drawn on top of other plot elements. |
| 28 | +- `x`: The input data (array-like or sequence of arrays). Can be a 1D array for a single boxplot or a sequence of arrays for multiple boxplots. |
| 29 | +- `notch`: Boolean, optional. If True, a notched boxplot is created to indicate confidence intervals around the median. |
| 30 | +- `sym`: String, optional. Default symbol for outlier points. An empty string hides the outliers. |
| 31 | +- `vert`: Boolean, optional. If True (default), plots boxes vertically. If False, plots horizontally. |
42 | 32 |
|
43 |
| -## Examples |
| 33 | +**Return value:** |
44 | 34 |
|
45 |
| -Below are the examples demonstrating the use of `.boxplot()`. |
| 35 | +The method returns a [dictionary](https://www.codecademy.com/resources/docs/python/dictionaries) containing the matplotlib artists used in the boxplot. The dictionary includes keys for 'boxes', 'medians', 'whiskers', 'caps', 'fliers', and 'means'. |
| 36 | + |
| 37 | +## Example 1: Creating a Basic Boxplot using `matplotlib.pyplot.boxplot()` |
| 38 | + |
| 39 | +This example demonstrates how to create a simple boxplot using randomly generated data: |
46 | 40 |
|
47 | 41 | ```py
|
48 | 42 | import matplotlib.pyplot as plt
|
49 | 43 | import numpy as np
|
50 | 44 |
|
51 |
| -# Generate some random data |
52 |
| -data = [np.random.normal(0, std, 100) for std in range(1, 4)] |
| 45 | +# Set random seed for reproducibility |
| 46 | +np.random.seed(42) |
53 | 47 |
|
54 |
| -# Create a box and whisker plot |
55 |
| -plt.boxplot(data) |
| 48 | +# Generate sample data |
| 49 | +data = np.random.normal(100, 15, 200) |
56 | 50 |
|
57 |
| -# Show the plot |
| 51 | +# Create the boxplot |
| 52 | +plt.figure(figsize=(8, 6)) |
| 53 | +plt.boxplot(data) |
| 54 | +plt.title('Basic Boxplot Example') |
| 55 | +plt.ylabel('Values') |
58 | 56 | plt.show()
|
59 | 57 | ```
|
60 | 58 |
|
61 |
| -Output: |
| 59 | +The output of this code is: |
| 60 | + |
| 61 | + |
| 62 | + |
| 63 | +The code generates a dataset with 200 values following a normal distribution with a mean of 100 and a standard deviation of 15. The resulting boxplot displays the median as a horizontal line, the box representing the interquartile range (IQR), whiskers extending to the most extreme non-outlier data points, and any outliers as individual points. |
62 | 64 |
|
63 |
| - |
| 65 | +## Example 2: Multiple Dataset Comparison using the `matplotlib.pyplot.boxplot()` method |
| 66 | + |
| 67 | +This example shows how to create boxplots for multiple datasets to compare their distributions: |
64 | 68 |
|
65 | 69 | ```py
|
66 | 70 | import matplotlib.pyplot as plt
|
67 | 71 | import numpy as np
|
68 | 72 |
|
69 |
| -# Generate some random data |
70 |
| -data = [np.random.normal(0, std, 100) for std in range(1, 4)] |
| 73 | +# Set random seed for reproducibility |
| 74 | +np.random.seed(42) |
| 75 | + |
| 76 | +# Generate multiple datasets with different characteristics |
| 77 | +dataset1 = np.random.normal(80, 10, 100) # Lower mean, smaller spread |
| 78 | +dataset2 = np.random.normal(100, 20, 100) # Higher mean, larger spread |
| 79 | +dataset3 = np.random.exponential(25, 100) # Exponential distribution |
| 80 | +dataset4 = np.random.uniform(50, 150, 100) # Uniform distribution |
| 81 | + |
| 82 | +# Combine datasets |
| 83 | +data = [dataset1, dataset2, dataset3, dataset4] |
| 84 | + |
| 85 | +# Create multiple boxplots |
| 86 | +plt.figure(figsize=(10, 6)) |
| 87 | +box_plot = plt.boxplot(data, labels=['Normal (80,10)', 'Normal (100,20)', |
| 88 | + 'Exponential (25)', 'Uniform (50,150)']) |
| 89 | +plt.title('Comparison of Different Distributions') |
| 90 | +plt.ylabel('Values') |
| 91 | +plt.xlabel('Distribution Type') |
| 92 | +plt.xticks(rotation=45) |
| 93 | +plt.tight_layout() |
| 94 | +plt.show() |
| 95 | +``` |
| 96 | + |
| 97 | +The output of this code is: |
| 98 | + |
| 99 | + |
| 100 | + |
| 101 | +This example creates four different datasets with distinct statistical properties and displays them side by side. The boxplots make it easy to compare the medians, spreads, and presence of outliers across the different distributions. |
71 | 102 |
|
72 |
| -# Create a box and whisker plot with some custom parameters |
73 |
| -plt.boxplot(data, notch=True, sym='g+', vert=False, whis=0.75, bootstrap=10000, usermedians=[np.mean(d) for d in data], conf_intervals=None, patch_artist=True) |
| 103 | +## Example 3: Customized Sales Performance Analysis on Boxplot |
74 | 104 |
|
75 |
| -# Add labels and title |
76 |
| -plt.xlabel('Value') |
77 |
| -plt.ylabel('Group') |
78 |
| -plt.title('Customized box and whisker plot') |
| 105 | +This example demonstrates a real-world scenario analyzing quarterly sales performance across different product categories: |
79 | 106 |
|
80 |
| -# Show the plot |
| 107 | +```py |
| 108 | +import matplotlib.pyplot as plt |
| 109 | +import numpy as np |
| 110 | + |
| 111 | +# Set random seed for reproducibility |
| 112 | +np.random.seed(42) |
| 113 | + |
| 114 | +# Simulate quarterly sales data (in thousands) |
| 115 | +electronics = np.random.normal(150, 25, 50) # Electronics sales |
| 116 | +clothing = np.random.normal(120, 30, 50) # Clothing sales |
| 117 | +home_goods = np.random.normal(100, 20, 50) # Home goods sales |
| 118 | +sports = np.random.normal(80, 15, 50) # Sports equipment sales |
| 119 | + |
| 120 | +# Add some outliers to make it more realistic |
| 121 | +electronics = np.append(electronics, [220, 250]) # High-performance months |
| 122 | +clothing = np.append(clothing, [200, 40]) # Seasonal variations |
| 123 | +home_goods = np.append(home_goods, [180]) # Holiday boost |
| 124 | +sports = np.append(sports, [150, 30]) # Seasonal impact |
| 125 | + |
| 126 | +# Combine all sales data |
| 127 | +sales_data = [electronics, clothing, home_goods, sports] |
| 128 | +categories = ['Electronics', 'Clothing', 'Home Goods', 'Sports'] |
| 129 | + |
| 130 | +# Create customized boxplot |
| 131 | +plt.figure(figsize=(12, 8)) |
| 132 | +box_plot = plt.boxplot(sales_data, |
| 133 | + labels=categories, |
| 134 | + patch_artist=True, # Fill with colors |
| 135 | + notch=True, # Show confidence intervals |
| 136 | + showmeans=True) # Show mean values |
| 137 | + |
| 138 | +# Customize colors for each category |
| 139 | +colors = ['lightblue', 'lightgreen', 'lightcoral', 'lightyellow'] |
| 140 | +for patch, color in zip(box_plot['boxes'], colors): |
| 141 | + patch.set_facecolor(color) |
| 142 | + |
| 143 | +# Customize the plot appearance |
| 144 | +plt.title('Quarterly Sales Performance Analysis by Product Category', |
| 145 | + fontsize=16, fontweight='bold') |
| 146 | +plt.ylabel('Sales (in thousands USD)', fontsize=12) |
| 147 | +plt.xlabel('Product Categories', fontsize=12) |
| 148 | +plt.grid(axis='y', alpha=0.3) |
| 149 | +plt.tight_layout() |
81 | 150 | plt.show()
|
82 | 151 | ```
|
83 | 152 |
|
84 |
| -Output: |
| 153 | +The output of this code is: |
| 154 | + |
| 155 | + |
| 156 | + |
| 157 | +This example simulates a business scenario where sales data is analyzed across different product categories. The customized boxplot uses colors to distinguish categories, shows confidence intervals through notches, and displays mean values alongside medians. This visualization helps identify which product categories perform best and have the most consistent sales patterns. |
| 158 | + |
| 159 | +## Frequently Asked Questions |
| 160 | + |
| 161 | +### 1. What is a box plot in Matplotlib? |
| 162 | + |
| 163 | +A box plot displays data distribution through five statistics: minimum, Q1, median, Q3, and maximum, with outliers shown as individual points. |
| 164 | + |
| 165 | +### 2. What is the difference between Seaborn Boxplot and Matplotlib Boxplot? |
| 166 | + |
| 167 | +Seaborn's boxplot offers better default styling and easier categorical data handling, while Matplotlib's boxplot provides more low-level control and customization options. |
| 168 | + |
| 169 | +### 3. How to plot a boxplot in a Python `DataFrame`? |
85 | 170 |
|
86 |
| - |
| 171 | +Pass DataFrame columns to `plt.boxplot([df['col1'], df['col2']])` or use pandas' built-in `df.boxplot()` method. |
0 commit comments