You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 3-Data-Visualization/R/10-visualization-distributions/README.md
+32-11Lines changed: 32 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Visualizing Distributions
2
2
3
-
|](../../sketchnotes/10-Visualizing-Distributions.png)|
3
+
|](https://github.com/microsoft/Data-Science-For-Beginners/blob/main/sketchnotes/10-Visualizing-Distributions.png)|
4
4
|:---:|
5
5
| Visualizing Distributions - _Sketchnote by [@nitya](https://twitter.com/nitya)_|

40
40
41
41
This gives an overview of the general distribution of body length per bird Order, but it is not the optimal way to display true distributions. That task is usually handled by creating a Histogram.
42
42
## Working with histograms
@@ -47,15 +47,15 @@ This gives an overview of the general distribution of body length per bird Order
47
47
ggplot(data=birds_filtered, aes(x=MaxBodyMass)) +
48
48
geom_histogram(bins=10)+ylab('Frequency')
49
49
```
50
-
![distribution over entire dataset]()
50
+

51
51
52
52
As you can see, most of the 400+ birds in this dataset fall in the range of under 2000 for their Max Body Mass. Gain more insight into the data by changing the `bins` parameter to a higher number, something like 30:
This chart shows the distribution in a bit more granular fashion. A chart less skewed to the left could be created by ensuring that you only select data within a given range:

116
116
117
117
There doesn't seem to be a good correlation between minimum wingspan and conservation status. Test other elements of the dataset using this method. You can try different filters as well. Do you find any correlation?
118
118
@@ -126,23 +126,23 @@ Let's work with density plot's now!
You can see how the plot echoes the previous one for Minimum Wingspan data; it's just a bit smoother. If you wanted to revisit that jagged MaxBodyMass line in the second chart you built, you could smooth it out very well by recreating it using this method:

156
156
157
157
You can also map the density of several variables in one chart. Text the MaxLength and MinLength of a bird compared to their conservation status:
158
+
```r
159
+
tobeinserted
160
+
```
161
+
162
+
![2d density plot]()
163
+
164
+
Perhaps it's worth researching whether the cluster of 'Vulnerable' birds according to their lengths is meaningful or not.
165
+
166
+
## 🚀 Challenge
167
+
168
+
Histograms are a more sophisticated type of chart than basic scatterplots, bar charts, or line charts. Go on a search on the internet to find good examples of the use of histograms. How are they used, what do they demonstrate, and in what fields or areas of inquiry do they tend to be used?
In this lesson, you used `ggplot2` and started working to show more sophisticated charts. Do some research on `geom_density_2d()` a "continuous probability density curve in one or more dimensions". Read through [the documentation](https://ggplot2.tidyverse.org/reference/geom_density_2d.html) to understand how it works.
0 commit comments