Skip to content

Commit de894a6

Browse files
authored
Merge pull request microsoft#307 from flegaspi700/main
The x and y labels on some of the PNG images are not readable when using a dark theme setting
2 parents c63e165 + 82d987b commit de894a6

File tree

10 files changed

+24
-7
lines changed

10 files changed

+24
-7
lines changed

3-Data-Visualization/10-visualization-distributions/README.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,15 @@ birds = pd.read_csv('../../data/birds.csv')
2020
birds.head()
2121
```
2222

23+
| | Name | ScientificName | Category | Order | Family | Genus | ConservationStatus | MinLength | MaxLength | MinBodyMass | MaxBodyMass | MinWingspan | MaxWingspan |
24+
| ---: | :--------------------------- | :--------------------- | :-------------------- | :----------- | :------- | :---------- | :----------------- | --------: | --------: | ----------: | ----------: | ----------: | ----------: |
25+
| 0 | Black-bellied whistling-duck | Dendrocygna autumnalis | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 47 | 56 | 652 | 1020 | 76 | 94 |
26+
| 1 | Fulvous whistling-duck | Dendrocygna bicolor | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 45 | 53 | 712 | 1050 | 85 | 93 |
27+
| 2 | Snow goose | Anser caerulescens | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 79 | 2050 | 4050 | 135 | 165 |
28+
| 3 | Ross's goose | Anser rossii | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 57.3 | 64 | 1066 | 1567 | 113 | 116 |
29+
| 4 | Greater white-fronted goose | Anser albifrons | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 81 | 1930 | 3310 | 130 | 165 |
30+
31+
2332
In general, you can quickly look at the way data is distributed by using a scatter plot as we did in the previous lesson:
2433

2534
```python
@@ -31,6 +40,8 @@ plt.xlabel('Max Length')
3140

3241
plt.show()
3342
```
43+
![max length per order](images/scatter-wb.png)
44+
3445
This gives an overview of the general distribution of body length per bird Order, but it is not the optimal way to display true distributions. That task is usually handled by creating a Histogram.
3546
## Working with histograms
3647

@@ -40,15 +51,15 @@ Matplotlib offers very good ways to visualize data distribution using Histograms
4051
birds['MaxBodyMass'].plot(kind = 'hist', bins = 10, figsize = (12,12))
4152
plt.show()
4253
```
43-
![distribution over the entire dataset](images/dist1.png)
54+
![distribution over the entire dataset](images/dist1-wb.png)
4455

4556
As you can see, most of the 400+ birds in this dataset fall in the range of under 2000 for their Max Body Mass. Gain more insight into the data by changing the `bins` parameter to a higher number, something like 30:
4657

4758
```python
4859
birds['MaxBodyMass'].plot(kind = 'hist', bins = 30, figsize = (12,12))
4960
plt.show()
5061
```
51-
![distribution over the entire dataset with larger bins param](images/dist2.png)
62+
![distribution over the entire dataset with larger bins param](images/dist2-wb.png)
5263

5364
This chart shows the distribution in a bit more granular fashion. A chart less skewed to the left could be created by ensuring that you only select data within a given range:
5465

@@ -59,7 +70,7 @@ filteredBirds = birds[(birds['MaxBodyMass'] > 1) & (birds['MaxBodyMass'] < 60)]
5970
filteredBirds['MaxBodyMass'].plot(kind = 'hist',bins = 40,figsize = (12,12))
6071
plt.show()
6172
```
62-
![filtered histogram](images/dist3.png)
73+
![filtered histogram](images/dist3-wb.png)
6374

6475
✅ Try some other filters and data points. To see the full distribution of the data, remove the `['MaxBodyMass']` filter to show labeled distributions.
6576

@@ -76,7 +87,7 @@ hist = ax.hist2d(x, y)
7687
```
7788
There appears to be an expected correlation between these two elements along an expected axis, with one particularly strong point of convergence:
7889

79-
![2D plot](images/2D.png)
90+
![2D plot](images/2D-wb.png)
8091

8192
Histograms work well by default for numeric data. What if you need to see distributions according to text data?
8293
## Explore the dataset for distributions using text data
@@ -115,7 +126,7 @@ plt.gca().set(title='Conservation Status', ylabel='Max Body Mass')
115126
plt.legend();
116127
```
117128

118-
![wingspan and conservation collation](images/histogram-conservation.png)
129+
![wingspan and conservation collation](images/histogram-conservation-wb.png)
119130

120131
There doesn't seem to be a good correlation between minimum wingspan and conservation status. Test other elements of the dataset using this method. You can try different filters as well. Do you find any correlation?
121132

5 KB
Loading
11.1 KB
Loading
10.1 KB
Loading
8.56 KB
Loading
12.9 KB
Loading
39.1 KB
Loading

3-Data-Visualization/11-visualization-proportions/README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,12 @@ Take this data and convert the 'class' column to a category:
5757
cols = mushrooms.select_dtypes(["object"]).columns
5858
mushrooms[cols] = mushrooms[cols].astype('category')
5959
```
60+
61+
```python
62+
edibleclass=mushrooms.groupby(['class']).count()
63+
edibleclass
64+
```
65+
6066
Now, if you print out the mushrooms data, you can see that it has been grouped into categories according to the poisonous/edible class:
6167

6268

@@ -78,7 +84,7 @@ plt.show()
7884
```
7985
Voila, a pie chart showing the proportions of this data according to these two classes of mushrooms. It's quite important to get the order of the labels correct, especially here, so be sure to verify the order with which the label array is built!
8086

81-
![pie chart](images/pie1.png)
87+
![pie chart](images/pie1-wb.png)
8288

8389
## Donuts!
8490

@@ -108,7 +114,7 @@ plt.title('Mushroom Habitats')
108114
plt.show()
109115
```
110116

111-
![donut chart](images/donut.png)
117+
![donut chart](images/donut-wb.png)
112118

113119
This code draws a chart and a center circle, then adds that center circle in the chart. Edit the width of the center circle by changing `0.40` to another value.
114120

17.5 KB
Loading
7.77 KB
Loading

0 commit comments

Comments
 (0)