Skip to content

Commit cfa87fb

Browse files
committed
added R folders for 3 lessons
1 parent f3fec21 commit cfa87fb

File tree

3 files changed

+532
-0
lines changed
  • 3-Data-Visualization

3 files changed

+532
-0
lines changed
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# Visualizing Proportions
2+
3+
|![ Sketchnote by [(@sketchthedocs)](https://sketchthedocs.dev) ](../../sketchnotes/11-Visualizing-Proportions.png)|
4+
|:---:|
5+
|Visualizing Proportions - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
6+
7+
In this lesson, you will use a different nature-focused dataset to visualize proportions, such as how many different types of fungi populate a given dataset about mushrooms. Let's explore these fascinating fungi using a dataset sourced from Audubon listing details about 23 species of gilled mushrooms in the Agaricus and Lepiota families. You will experiment with tasty visualizations such as:
8+
9+
- Pie charts 🥧
10+
- Donut charts 🍩
11+
- Waffle charts 🧇
12+
13+
> 💡 A very interesting project called [Charticulator](https://charticulator.com) by Microsoft Research offers a free drag and drop interface for data visualizations. In one of their tutorials they also use this mushroom dataset! So you can explore the data and learn the library at the same time: [Charticulator tutorial](https://charticulator.com/tutorials/tutorial4.html).
14+
15+
## [Pre-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/20)
16+
17+
## Get to know your mushrooms 🍄
18+
19+
Mushrooms are very interesting. Let's import a dataset to study them:
20+
21+
```python
22+
import pandas as pd
23+
import matplotlib.pyplot as plt
24+
mushrooms = pd.read_csv('../../data/mushrooms.csv')
25+
mushrooms.head()
26+
```
27+
A table is printed out with some great data for analysis:
28+
29+
30+
| class | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | stalk-root | stalk-surface-above-ring | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat |
31+
| --------- | --------- | ----------- | --------- | ------- | ------- | --------------- | ------------ | --------- | ---------- | ----------- | ---------- | ------------------------ | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- |
32+
| Poisonous | Convex | Smooth | Brown | Bruises | Pungent | Free | Close | Narrow | Black | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
33+
| Edible | Convex | Smooth | Yellow | Bruises | Almond | Free | Close | Broad | Black | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Grasses |
34+
| Edible | Bell | Smooth | White | Bruises | Anise | Free | Close | Broad | Brown | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Meadows |
35+
| Poisonous | Convex | Scaly | White | Bruises | Pungent | Free | Close | Narrow | Brown | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
36+
37+
Right away, you notice that all the data is textual. You will have to convert this data to be able to use it in a chart. Most of the data, in fact, is represented as an object:
38+
39+
```python
40+
print(mushrooms.select_dtypes(["object"]).columns)
41+
```
42+
43+
The output is:
44+
45+
```output
46+
Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
47+
'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
48+
'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
49+
'stalk-surface-below-ring', 'stalk-color-above-ring',
50+
'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',
51+
'ring-type', 'spore-print-color', 'population', 'habitat'],
52+
dtype='object')
53+
```
54+
Take this data and convert the 'class' column to a category:
55+
56+
```python
57+
cols = mushrooms.select_dtypes(["object"]).columns
58+
mushrooms[cols] = mushrooms[cols].astype('category')
59+
```
60+
61+
```python
62+
edibleclass=mushrooms.groupby(['class']).count()
63+
edibleclass
64+
```
65+
66+
Now, if you print out the mushrooms data, you can see that it has been grouped into categories according to the poisonous/edible class:
67+
68+
69+
| | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | ... | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat |
70+
| --------- | --------- | ----------- | --------- | ------- | ---- | --------------- | ------------ | --------- | ---------- | ----------- | --- | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- |
71+
| class | | | | | | | | | | | | | | | | | | | | | |
72+
| Edible | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | ... | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 |
73+
| Poisonous | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | ... | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 |
74+
75+
If you follow the order presented in this table to create your class category labels, you can build a pie chart:
76+
77+
## Pie!
78+
79+
```python
80+
labels=['Edible','Poisonous']
81+
plt.pie(edibleclass['population'],labels=labels,autopct='%.1f %%')
82+
plt.title('Edible?')
83+
plt.show()
84+
```
85+
Voila, a pie chart showing the proportions of this data according to these two classes of mushrooms. It's quite important to get the order of the labels correct, especially here, so be sure to verify the order with which the label array is built!
86+
87+
![pie chart](images/pie1-wb.png)
88+
89+
## Donuts!
90+
91+
A somewhat more visually interesting pie chart is a donut chart, which is a pie chart with a hole in the middle. Let's look at our data using this method.
92+
93+
Take a look at the various habitats where mushrooms grow:
94+
95+
```python
96+
habitat=mushrooms.groupby(['habitat']).count()
97+
habitat
98+
```
99+
Here, you are grouping your data by habitat. There are 7 listed, so use those as labels for your donut chart:
100+
101+
```python
102+
labels=['Grasses','Leaves','Meadows','Paths','Urban','Waste','Wood']
103+
104+
plt.pie(habitat['class'], labels=labels,
105+
autopct='%1.1f%%', pctdistance=0.85)
106+
107+
center_circle = plt.Circle((0, 0), 0.40, fc='white')
108+
fig = plt.gcf()
109+
110+
fig.gca().add_artist(center_circle)
111+
112+
plt.title('Mushroom Habitats')
113+
114+
plt.show()
115+
```
116+
117+
![donut chart](images/donut-wb.png)
118+
119+
This code draws a chart and a center circle, then adds that center circle in the chart. Edit the width of the center circle by changing `0.40` to another value.
120+
121+
Donut charts can be tweaked in several ways to change the labels. The labels in particular can be highlighted for readability. Learn more in the [docs](https://matplotlib.org/stable/gallery/pie_and_polar_charts/pie_and_donut_labels.html?highlight=donut).
122+
123+
Now that you know how to group your data and then display it as a pie or donut, you can explore other types of charts. Try a waffle chart, which is just a different way of exploring quantity.
124+
## Waffles!
125+
126+
A 'waffle' type chart is a different way to visualize quantities as a 2D array of squares. Try visualizing the different quantities of mushroom cap colors in this dataset. To do this, you need to install a helper library called [PyWaffle](https://pypi.org/project/pywaffle/) and use Matplotlib:
127+
128+
```python
129+
pip install pywaffle
130+
```
131+
132+
Select a segment of your data to group:
133+
134+
```python
135+
capcolor=mushrooms.groupby(['cap-color']).count()
136+
capcolor
137+
```
138+
139+
Create a waffle chart by creating labels and then grouping your data:
140+
141+
```python
142+
import pandas as pd
143+
import matplotlib.pyplot as plt
144+
from pywaffle import Waffle
145+
146+
data ={'color': ['brown', 'buff', 'cinnamon', 'green', 'pink', 'purple', 'red', 'white', 'yellow'],
147+
'amount': capcolor['class']
148+
}
149+
150+
df = pd.DataFrame(data)
151+
152+
fig = plt.figure(
153+
FigureClass = Waffle,
154+
rows = 100,
155+
values = df.amount,
156+
labels = list(df.color),
157+
figsize = (30,30),
158+
colors=["brown", "tan", "maroon", "green", "pink", "purple", "red", "whitesmoke", "yellow"],
159+
)
160+
```
161+
162+
Using a waffle chart, you can plainly see the proportions of cap colors of this mushrooms dataset. Interestingly, there are many green-capped mushrooms!
163+
164+
![waffle chart](images/waffle.png)
165+
166+
✅ Pywaffle supports icons within the charts that use any icon available in [Font Awesome](https://fontawesome.com/). Do some experiments to create an even more interesting waffle chart using icons instead of squares.
167+
168+
In this lesson, you learned three ways to visualize proportions. First, you need to group your data into categories and then decide which is the best way to display the data - pie, donut, or waffle. All are delicious and gratify the user with an instant snapshot of a dataset.
169+
170+
## 🚀 Challenge
171+
172+
Try recreating these tasty charts in [Charticulator](https://charticulator.com).
173+
## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/21)
174+
175+
## Review & Self Study
176+
177+
Sometimes it's not obvious when to use a pie, donut, or waffle chart. Here are some articles to read on this topic:
178+
179+
https://www.beautiful.ai/blog/battle-of-the-charts-pie-chart-vs-donut-chart
180+
181+
https://medium.com/@hypsypops/pie-chart-vs-donut-chart-showdown-in-the-ring-5d24fd86a9ce
182+
183+
https://www.mit.edu/~mbarker/formula1/f1help/11-ch-c6.htm
184+
185+
https://medium.datadriveninvestor.com/data-visualization-done-the-right-way-with-tableau-waffle-chart-fdf2a19be402
186+
187+
Do some research to find more information on this sticky decision.
188+
## Assignment
189+
190+
[Try it in Excel](assignment.md)

0 commit comments

Comments
 (0)