You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 20, 2021. It is now read-only.
Copy file name to clipboardExpand all lines: book/applications/biological-diversity.md
+63-35Lines changed: 63 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -144,7 +144,34 @@ We could compute this in python as follows:
144
144
145
145
Imagine that we have the same table, but some additional information about the OTUs in the table. Specifically, we've computed the following phylogenetic tree. And, for the sake of illustration, imagine that we've also assigned taxonomy to each of the OTUs and found that our samples contain representatives from the archaea, bacteria, and eukaryotes (their labels begin with `A`, `B`, and `E`, respectively).
First, let's define a phylogenetic tree using the Newick format (which is described [here](http://evolution.genetics.washington.edu/phylip/newicktree.html), and more formally defined [here](http://evolution.genetics.washington.edu/phylip/newick_doc.html)). We'll then load that up using [scikit-bio](http://scikit-bio.org)'s [TreeNode](http://scikit-bio.org/generated/skbio.core.tree.TreeNode.html#skbio.core.tree.TreeNode) object, and visualize it with [ete3](http://etetoolkit.org).
>>> t = ete3.Tree.from_skbio(tree, map_attributes=["value"])
172
+
>>> t.render("%%inline", tree_style=ts)
173
+
<IPython.core.display.Image object>
174
+
```
148
175
149
176
Pairing this with the table we defined above (displayed again in the cell below), given what you now know about these OTUs, which would you consider the most diverse? Are you happy with the $\alpha$ diversity conclusion that you obtained when computing the number of observed OTUs in each sample?
150
177
@@ -166,18 +193,6 @@ Phylogenetic Diversity (PD) is a metric that was developed by Dan Faith in the e
166
193
167
194
PD is relatively simple to calculate. It is computed simply as the sum of the branch length in a phylogenetic tree that is "covered" or represented in a given sample. Let's look at an example to see how this works.
168
195
169
-
First, let's define a phylogenetic tree using the Newick format (which is described [here](http://evolution.genetics.washington.edu/phylip/newicktree.html), and more formally defined [here](http://evolution.genetics.washington.edu/phylip/newick_doc.html)). We'll then load that up using [scikit-bio](http://scikit-bio.org)'s [TreeNode](http://scikit-bio.org/generated/skbio.core.tree.TreeNode.html#skbio.core.tree.TreeNode) object.
How does this result compare to what we observed above with the Observed OTUs metric? Based on your knowledge of biology, which do you think is a better representation of the relative diversities of these samples?
@@ -596,10 +611,7 @@ First, let's look at the analysis presented in panels E and F. Instead of genera
596
611
>>> ax.set_xticklabels(['same body habitat', 'different body habitat'])
597
612
>>> ax.set_ylabel('Unweighted UniFrac Distance')
598
613
>>> _ = ax.set_ylim(0.0, 1.0)
599
-
/Users/caporaso/miniconda3/envs/iab/lib/python3.4/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
600
-
warnings.warn(self.msg_depr % (key, alt_key))
601
-
/Users/caporaso/miniconda3/envs/iab/lib/python3.4/site-packages/matplotlib/__init__.py:892: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
602
-
warnings.warn(self.msg_depr % (key, alt_key))
614
+
<Figure size 432x288with1 Axes>
603
615
```
604
616
605
617
```python
@@ -610,7 +622,7 @@ test statistic name R
610
622
sample size 6
611
623
number of groups 3
612
624
test statistic 1
613
-
p-value 0.054
625
+
p-value 0.065
614
626
number of permutations 999
615
627
Name: ANOSIM results, dtype: object
616
628
```
@@ -630,8 +642,7 @@ If we run through these same steps, but base our analysis on a different metadat
/Users/caporaso/miniconda3/envs/iab/lib/python3.4/site-packages/matplotlib/__init__.py:892: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
634
-
warnings.warn(self.msg_depr % (key, alt_key))
645
+
<Figure size 432x288with1 Axes>
635
646
```
636
647
637
648
```python
@@ -641,7 +652,7 @@ test statistic name R
641
652
sample size 6
642
653
number of groups 2
643
654
test statistic -0.333333
644
-
p-value 0.889
655
+
p-value 0.869
645
656
number of permutations 999
646
657
Name: ANOSIM results, dtype: object
647
658
```
@@ -659,6 +670,7 @@ Next, let's look at a hierarchical clustering analysis, similar to that presente
And next we'll color the samples by the person that they're derived from. Notice that this plot and the one above are identical except for coloring. Think about how the colors (and therefore the sample metadata) help you to interpret these plots.
@@ -780,6 +796,7 @@ And next we'll color the samples by the person that they're derived from. Notice
Does this visualization help you to interpret the results? Probably not. Generally we'll need to apply some approaches that will help us with interpretation. Let's use ordination here. We'll run Principal Coordinates Analysis on our ``DistanceMatrix`` object. This gives us a matrix of coordinate values for each sample, which we can then plot. We can use ``scikit-bio``'s implementation of PCoA as follows:
@@ -964,6 +987,7 @@ What does the following ordination plot tell you about the relationship between
If the answer to the above question is that there doesn't seem to be much association, you're on the right track. We can quantify this, for example, by testing for correlation between pH and value on PC 1.
@@ -982,6 +1006,7 @@ In the next plot, we'll color the points by the pH of the soil sample they repre
...title="Weighted UniFrac, samples colored by pH",
1032
1059
...axis_labels=('PC1', 'PC2', 'PC3'))
1033
-
/Users/caporaso/miniconda3/envs/iab/lib/python3.4/site-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:102: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is-0.010291669756329344and the largest is3.8374200744108204.
1060
+
/Users/gregcaporaso/miniconda3/envs/iab/lib/python3.5/site-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:102: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is-0.010291669756329357and the largest is3.8374200744108204.
1034
1061
RuntimeWarning
1062
+
<Figure size 432x288with2 Axes>
1035
1063
```
1036
1064
1037
1065
Specifically, what we want to ask when comparing these results is**given a pair of ordination plots, is their shape (in two or three dimensions) the same?** The reason we care is that we want to know, **given a pair of ordination plots, would we derive the same biological conclusions regardless of which plot we look at?**
0 commit comments