PDF

Description

It is becoming increasingly common to analyze collections of sequence reads by first assigning each read to a location on a phylogenetic tree. In parallel, quantitative methods are being developed to compare samples of reads using the information provided by such phylogenetic placements: one example is the phylogenetic Kantorovich-Rubinstein (KR) metric which calculates a distance between pairs of samples using the evolutionary distances between the assigned positions of the reads on the phylogenetic tree. The KR distance generalizes the weighted UniFrac metric. Classical, general-purpose ordination and clustering methods can be applied to KR distances, but we argue that more interesting and interpretable results are produced by two new methods that leverage the special structure of phylogenetic placement data. Edge principal components analysis enables the detection of important differences between samples containing closely related taxa and allows the visualization of the principal component axes in terms of edges of the phylogenetic tree. Squash clustering produces informative internal edge lengths for clustering trees by incorporating distances between averages of samples, rather than the averages of distances between samples used in general-purpose procedures such as UPGMA. We present these methods and illustrate their use with data from the microbiome of the human vagina.

Details

Files

Statistics

from
to
Export
Download Full History