Skip to content

Commit c42de19

Browse files
Committing 2nd draft of 10th lesson
To do: * density 2d plot (Had some issues, would be rectifying soon)
1 parent ef5ed2d commit c42de19

File tree

1 file changed

+32
-11
lines changed
  • 3-Data-Visualization/R/10-visualization-distributions

1 file changed

+32
-11
lines changed

3-Data-Visualization/R/10-visualization-distributions/README.md

Lines changed: 32 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Visualizing Distributions
22

3-
|![ Sketchnote by [(@sketchthedocs)](https://sketchthedocs.dev) ](../../sketchnotes/10-Visualizing-Distributions.png)|
3+
|![ Sketchnote by [(@sketchthedocs)](https://sketchthedocs.dev) ](https://github.com/microsoft/Data-Science-For-Beginners/blob/main/sketchnotes/10-Visualizing-Distributions.png)|
44
|:---:|
55
| Visualizing Distributions - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
66

@@ -36,7 +36,7 @@ ggplot(data=birds_filtered, aes(x=Order, y=MaxLength,group=1)) +
3636
geom_point() +
3737
ggtitle("Max Length per order") + coord_flip()
3838
```
39-
![max length per order]()
39+
![max length per order](images/max-length-per-order.png)
4040

4141
This gives an overview of the general distribution of body length per bird Order, but it is not the optimal way to display true distributions. That task is usually handled by creating a Histogram.
4242
## Working with histograms
@@ -47,15 +47,15 @@ This gives an overview of the general distribution of body length per bird Order
4747
ggplot(data = birds_filtered, aes(x = MaxBodyMass)) +
4848
geom_histogram(bins=10)+ylab('Frequency')
4949
```
50-
![distribution over entire dataset]()
50+
![distribution over entire dataset](images/distribution-over-the-entire-dataset.png)
5151

5252
As you can see, most of the 400+ birds in this dataset fall in the range of under 2000 for their Max Body Mass. Gain more insight into the data by changing the `bins` parameter to a higher number, something like 30:
5353

5454
```r
5555
ggplot(data = birds_filtered, aes(x = MaxBodyMass)) + geom_histogram(bins=30)+ylab('Frequency')
5656
```
5757

58-
![distribution-30bins]()
58+
![distribution-30bins](images/distribution-30bins.png)
5959

6060
This chart shows the distribution in a bit more granular fashion. A chart less skewed to the left could be created by ensuring that you only select data within a given range:
6161

@@ -67,7 +67,7 @@ ggplot(data = birds_filtered_1, aes(x = MaxBodyMass)) +
6767
geom_histogram(bins=30)+ylab('Frequency')
6868
```
6969

70-
![filtered histogram]()
70+
![filtered histogram](images/filtered-histogram.png)
7171

7272
✅ Try some other filters and data points. To see the full distribution of the data, remove the `['MaxBodyMass']` filter to show labeled distributions.
7373

@@ -81,7 +81,7 @@ ggplot(data=birds_filtered_1, aes(x=MaxBodyMass, y=MaxLength) ) +
8181
```
8282
There appears to be an expected correlation between these two elements along an expected axis, with one particularly strong point of convergence:
8383

84-
![2d plot]()
84+
![2d plot](images/2d-plot.png)
8585

8686
Histograms work well by default for numeric data. What if you need to see distributions according to text data?
8787
## Explore the dataset for distributions using text data
@@ -112,7 +112,7 @@ ggplot(data=birds_filtered_1, aes(x = MinWingspan, fill = ConservationStatus)) +
112112
scale_fill_manual(name="Conservation Status",values=c("red","green","blue","pink"),labels=c("Endangered","Near Threathened","Vulnerable","Least Concern"))
113113
```
114114

115-
![wingspan and conservation collation]()
115+
![wingspan and conservation collation](images/wingspan-conservation-collation.png)
116116

117117
There doesn't seem to be a good correlation between minimum wingspan and conservation status. Test other elements of the dataset using this method. You can try different filters as well. Do you find any correlation?
118118

@@ -126,23 +126,23 @@ Let's work with density plot's now!
126126
ggplot(data = birds_filtered_1, aes(x = MinWingspan)) +
127127
geom_density()
128128
```
129-
![density plot]()
129+
![density plot](images/density-plot.png)
130130

131131
You can see how the plot echoes the previous one for Minimum Wingspan data; it's just a bit smoother. If you wanted to revisit that jagged MaxBodyMass line in the second chart you built, you could smooth it out very well by recreating it using this method:
132132

133133
```r
134134
ggplot(data = birds_filtered_1, aes(x = MaxBodyMass)) +
135135
geom_density()
136136
```
137-
![bodymass density]()
137+
![bodymass density](images/bodymass-smooth.png)
138138

139139
If you wanted a smooth, but not too smooth line, edit the `adjust` parameter:
140140

141141
```r
142142
ggplot(data = birds_filtered_1, aes(x = MaxBodyMass)) +
143143
geom_density(adjust = 1/5)
144144
```
145-
![less smooth bodymass]()
145+
![less smooth bodymass](images/less-smooth-bodymass.png)
146146

147147
✅ Read about the parameters available for this type of plot and experiment!
148148

@@ -152,8 +152,29 @@ This type of chart offers beautifully explanatory visualizations. With a few lin
152152
ggplot(data=birds_filtered_1,aes(x = MaxBodyMass, fill = Order)) +
153153
geom_density(alpha=0.5)
154154
```
155-
![bodymass per order]()
155+
![bodymass per order](images/bodymass-per-order.png)
156156

157157
You can also map the density of several variables in one chart. Text the MaxLength and MinLength of a bird compared to their conservation status:
158+
```r
159+
to be inserted
160+
```
161+
162+
![2d density plot]()
163+
164+
Perhaps it's worth researching whether the cluster of 'Vulnerable' birds according to their lengths is meaningful or not.
165+
166+
## 🚀 Challenge
167+
168+
Histograms are a more sophisticated type of chart than basic scatterplots, bar charts, or line charts. Go on a search on the internet to find good examples of the use of histograms. How are they used, what do they demonstrate, and in what fields or areas of inquiry do they tend to be used?
169+
170+
## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/19)
171+
172+
## Review & Self Study
173+
174+
In this lesson, you used `ggplot2` and started working to show more sophisticated charts. Do some research on `geom_density_2d()` a "continuous probability density curve in one or more dimensions". Read through [the documentation](https://ggplot2.tidyverse.org/reference/geom_density_2d.html) to understand how it works.
175+
176+
## Assignment
177+
178+
[Apply your skills](assignment.md)
158179

159180

0 commit comments

Comments
 (0)