Skip to content

Commit c8b61d3

Browse files
committed
2 parents 82f70bb + f2666cf commit c8b61d3

File tree

2 files changed

+63
-59
lines changed
  • 3-Data-Visualization/R

2 files changed

+63
-59
lines changed

3-Data-Visualization/R/11-visualization-proportions/README.md

Lines changed: 59 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,9 @@ In this lesson, you will use a different nature-focused dataset to visualize pro
1818

1919
Mushrooms are very interesting. Let's import a dataset to study them:
2020

21-
```python
22-
import pandas as pd
23-
import matplotlib.pyplot as plt
24-
mushrooms = pd.read_csv('../../data/mushrooms.csv')
25-
mushrooms.head()
21+
```r
22+
mushrooms = read.csv('../../data/mushrooms.csv')
23+
head(mushrooms)
2624
```
2725
A table is printed out with some great data for analysis:
2826

@@ -32,55 +30,60 @@ A table is printed out with some great data for analysis:
3230
| Poisonous | Convex | Smooth | Brown | Bruises | Pungent | Free | Close | Narrow | Black | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
3331
| Edible | Convex | Smooth | Yellow | Bruises | Almond | Free | Close | Broad | Black | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Grasses |
3432
| Edible | Bell | Smooth | White | Bruises | Anise | Free | Close | Broad | Brown | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Meadows |
35-
| Poisonous | Convex | Scaly | White | Bruises | Pungent | Free | Close | Narrow | Brown | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
36-
33+
| Poisonous | Convex | Scaly | White | Bruises | Pungent | Free | Close | Narrow | Brown | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban
34+
| Edible | Convex |Smooth | Green | No Bruises| None |Free | Crowded | Broad | Black | Tapering | Equal | Smooth | Smooth | White | White | Partial | White | One | Evanescent | Brown | Abundant | Grasses
35+
|Edible | Convex | Scaly | Yellow | Bruises | Almond | Free | Close | Broad | Brown | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Numerous | Grasses
36+
3737
Right away, you notice that all the data is textual. You will have to convert this data to be able to use it in a chart. Most of the data, in fact, is represented as an object:
3838

39-
```python
40-
print(mushrooms.select_dtypes(["object"]).columns)
39+
```r
40+
names(mushrooms)
4141
```
4242

4343
The output is:
4444

4545
```output
46-
Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
47-
'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
48-
'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
49-
'stalk-surface-below-ring', 'stalk-color-above-ring',
50-
'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',
51-
'ring-type', 'spore-print-color', 'population', 'habitat'],
52-
dtype='object')
46+
[1] "class" "cap.shape"
47+
[3] "cap.surface" "cap.color"
48+
[5] "bruises" "odor"
49+
[7] "gill.attachment" "gill.spacing"
50+
[9] "gill.size" "gill.color"
51+
[11] "stalk.shape" "stalk.root"
52+
[13] "stalk.surface.above.ring" "stalk.surface.below.ring"
53+
[15] "stalk.color.above.ring" "stalk.color.below.ring"
54+
[17] "veil.type" "veil.color"
55+
[19] "ring.number" "ring.type"
56+
[21] "spore.print.color" "population"
57+
[23] "habitat"
5358
```
5459
Take this data and convert the 'class' column to a category:
5560

56-
```python
57-
cols = mushrooms.select_dtypes(["object"]).columns
58-
mushrooms[cols] = mushrooms[cols].astype('category')
61+
```r
62+
grouped=mushrooms %>%
63+
group_by(class) %>%
64+
summarise(count=n())
5965
```
6066

61-
```python
62-
edibleclass=mushrooms.groupby(['class']).count()
63-
edibleclass
64-
```
6567

6668
Now, if you print out the mushrooms data, you can see that it has been grouped into categories according to the poisonous/edible class:
69+
```r
70+
View(grouped)
71+
```
72+
73+
74+
| class | count |
75+
| --------- | --------- |
76+
| Edible | 4208 |
77+
| Poisonous| 3916 |
6778

6879

69-
| | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | ... | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat |
70-
| --------- | --------- | ----------- | --------- | ------- | ---- | --------------- | ------------ | --------- | ---------- | ----------- | --- | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- |
71-
| class | | | | | | | | | | | | | | | | | | | | | |
72-
| Edible | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | ... | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 |
73-
| Poisonous | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | ... | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 |
7480

75-
If you follow the order presented in this table to create your class category labels, you can build a pie chart:
81+
If you follow the order presented in this table to create your class category labels, you can build a pie chart.
7682

7783
## Pie!
7884

79-
```python
80-
labels=['Edible','Poisonous']
81-
plt.pie(edibleclass['population'],labels=labels,autopct='%.1f %%')
82-
plt.title('Edible?')
83-
plt.show()
85+
```r
86+
pie(grouped$count,grouped$class, main="Edible?")
8487
```
8588
Voila, a pie chart showing the proportions of this data according to these two classes of mushrooms. It's quite important to get the order of the labels correct, especially here, so be sure to verify the order with which the label array is built!
8689

@@ -92,26 +95,29 @@ A somewhat more visually interesting pie chart is a donut chart, which is a pie
9295

9396
Take a look at the various habitats where mushrooms grow:
9497

95-
```python
96-
habitat=mushrooms.groupby(['habitat']).count()
97-
habitat
98+
```r
99+
habitat=mushrooms %>%
100+
group_by(habitat) %>%
101+
summarise(count=n())
102+
View(habitat)
98103
```
99-
Here, you are grouping your data by habitat. There are 7 listed, so use those as labels for your donut chart:
104+
The output is:
105+
| habitat| count |
106+
| --------- | --------- |
107+
| Grasses | 2148 |
108+
| Leaves| 832 |
109+
| Meadows | 292 |
110+
| Paths| 1144 |
111+
| Urban | 368 |
112+
| Waste| 192 |
113+
| Wood| 3148 |
100114

101-
```python
102-
labels=['Grasses','Leaves','Meadows','Paths','Urban','Waste','Wood']
103115

104-
plt.pie(habitat['class'], labels=labels,
105-
autopct='%1.1f%%', pctdistance=0.85)
106-
107-
center_circle = plt.Circle((0, 0), 0.40, fc='white')
108-
fig = plt.gcf()
116+
Here, you are grouping your data by habitat. There are 7 listed, so use those as labels for your donut chart:
109117

110-
fig.gca().add_artist(center_circle)
111-
112-
plt.title('Mushroom Habitats')
113-
114-
plt.show()
118+
```r
119+
library(webr)
120+
PieDonut(habitat, aes(habitat, count=count))
115121
```
116122

117123
![donut chart](images/donut-wb.png)
@@ -123,10 +129,10 @@ Donut charts can be tweaked in several ways to change the labels. The labels in
123129
Now that you know how to group your data and then display it as a pie or donut, you can explore other types of charts. Try a waffle chart, which is just a different way of exploring quantity.
124130
## Waffles!
125131

126-
A 'waffle' type chart is a different way to visualize quantities as a 2D array of squares. Try visualizing the different quantities of mushroom cap colors in this dataset. To do this, you need to install a helper library called [PyWaffle](https://pypi.org/project/pywaffle/) and use Matplotlib:
132+
A 'waffle' type chart is a different way to visualize quantities as a 2D array of squares. Try visualizing the different quantities of mushroom cap colors in this dataset. To do this, you need to install a helper library called [waffle](https://r-charts.com/part-whole/waffle-chart-ggplot2/) and use it to generate your visualization:
127133

128-
```python
129-
pip install pywaffle
134+
```r
135+
install.packages("waffle", repos = "https://cinc.rud.is")
130136
```
131137

132138
Select a segment of your data to group:

3-Data-Visualization/R/12-visualization-relationships/README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,9 @@ Use a scatterplot to show how the price of honey has evolved, year over year, pe
2020

2121
Let's start by importing the data and Seaborn:
2222

23-
```python
24-
import pandas as pd
25-
import matplotlib.pyplot as plt
26-
import seaborn as sns
27-
honey = pd.read_csv('../../data/honey.csv')
28-
honey.head()
23+
```r
24+
honey=read.csv('../../data/honey.csv')
25+
head(honey)
2926
```
3027
You notice that the honey data has several interesting columns, including year and price per pound. Let's explore this data, grouped by U.S. state:
3128

@@ -36,6 +33,7 @@ You notice that the honey data has several interesting columns, including year a
3633
| AR | 53000 | 65 | 3445000 | 1688000 | 0.59 | 2033000 | 1998 |
3734
| CA | 450000 | 83 | 37350000 | 12326000 | 0.62 | 23157000 | 1998 |
3835
| CO | 27000 | 72 | 1944000 | 1594000 | 0.7 | 1361000 | 1998 |
36+
| FL | 230000 | 98 |22540000 | 4508000 | 0.64 | 14426000 | 1998 |
3937

4038

4139
Create a basic scatterplot to show the relationship between the price per pound of honey and its U.S. state of origin. Make the `y` axis tall enough to display all the states:

0 commit comments

Comments
 (0)