Skip to content

Commit 7a41b40

Browse files
authored
Update on PNG images
Added the birds.head() table Added the scatter image The x and y labels on some of the PNG images are not readable when using a dark theme setting
1 parent 49adaaf commit 7a41b40

File tree

1 file changed

+16
-5
lines changed
  • 3-Data-Visualization/10-visualization-distributions

1 file changed

+16
-5
lines changed

3-Data-Visualization/10-visualization-distributions/README.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,15 @@ birds = pd.read_csv('../../data/birds.csv')
2020
birds.head()
2121
```
2222

23+
| | Name | ScientificName | Category | Order | Family | Genus | ConservationStatus | MinLength | MaxLength | MinBodyMass | MaxBodyMass | MinWingspan | MaxWingspan |
24+
| ---: | :--------------------------- | :--------------------- | :-------------------- | :----------- | :------- | :---------- | :----------------- | --------: | --------: | ----------: | ----------: | ----------: | ----------: |
25+
| 0 | Black-bellied whistling-duck | Dendrocygna autumnalis | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 47 | 56 | 652 | 1020 | 76 | 94 |
26+
| 1 | Fulvous whistling-duck | Dendrocygna bicolor | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 45 | 53 | 712 | 1050 | 85 | 93 |
27+
| 2 | Snow goose | Anser caerulescens | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 79 | 2050 | 4050 | 135 | 165 |
28+
| 3 | Ross's goose | Anser rossii | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 57.3 | 64 | 1066 | 1567 | 113 | 116 |
29+
| 4 | Greater white-fronted goose | Anser albifrons | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 81 | 1930 | 3310 | 130 | 165 |
30+
31+
2332
In general, you can quickly look at the way data is distributed by using a scatter plot as we did in the previous lesson:
2433

2534
```python
@@ -31,6 +40,8 @@ plt.xlabel('Max Length')
3140

3241
plt.show()
3342
```
43+
![max length per order](images/scatter-wb.png)
44+
3445
This gives an overview of the general distribution of body length per bird Order, but it is not the optimal way to display true distributions. That task is usually handled by creating a Histogram.
3546
## Working with histograms
3647

@@ -40,15 +51,15 @@ Matplotlib offers very good ways to visualize data distribution using Histograms
4051
birds['MaxBodyMass'].plot(kind = 'hist', bins = 10, figsize = (12,12))
4152
plt.show()
4253
```
43-
![distribution over the entire dataset](images/dist1.png)
54+
![distribution over the entire dataset](images/dist1-wb.png)
4455

4556
As you can see, most of the 400+ birds in this dataset fall in the range of under 2000 for their Max Body Mass. Gain more insight into the data by changing the `bins` parameter to a higher number, something like 30:
4657

4758
```python
4859
birds['MaxBodyMass'].plot(kind = 'hist', bins = 30, figsize = (12,12))
4960
plt.show()
5061
```
51-
![distribution over the entire dataset with larger bins param](images/dist2.png)
62+
![distribution over the entire dataset with larger bins param](images/dist2-wb.png)
5263

5364
This chart shows the distribution in a bit more granular fashion. A chart less skewed to the left could be created by ensuring that you only select data within a given range:
5465

@@ -59,7 +70,7 @@ filteredBirds = birds[(birds['MaxBodyMass'] > 1) & (birds['MaxBodyMass'] < 60)]
5970
filteredBirds['MaxBodyMass'].plot(kind = 'hist',bins = 40,figsize = (12,12))
6071
plt.show()
6172
```
62-
![filtered histogram](images/dist3.png)
73+
![filtered histogram](images/dist3-wb.png)
6374

6475
✅ Try some other filters and data points. To see the full distribution of the data, remove the `['MaxBodyMass']` filter to show labeled distributions.
6576

@@ -76,7 +87,7 @@ hist = ax.hist2d(x, y)
7687
```
7788
There appears to be an expected correlation between these two elements along an expected axis, with one particularly strong point of convergence:
7889

79-
![2D plot](images/2D.png)
90+
![2D plot](images/2D-wb.png)
8091

8192
Histograms work well by default for numeric data. What if you need to see distributions according to text data?
8293
## Explore the dataset for distributions using text data
@@ -115,7 +126,7 @@ plt.gca().set(title='Conservation Status', ylabel='Max Body Mass')
115126
plt.legend();
116127
```
117128

118-
![wingspan and conservation collation](images/histogram-conservation.png)
129+
![wingspan and conservation collation](images/histogram-conservation-wb.png)
119130

120131
There doesn't seem to be a good correlation between minimum wingspan and conservation status. Test other elements of the dataset using this method. You can try different filters as well. Do you find any correlation?
121132

0 commit comments

Comments
 (0)