PDF

Description

We employ the Kantorovich-Rubinstein (KR) metric and $L^p$ generalizations to compare probability distributions on a given phylogenetic tree. Such distributions arise in the context of metagenomics, where a sample of environmental sequences may be treated as a collection of weighted points on a reference phylogenetic tree of known sequences. In contrast to many applications of Kantorovich-Rubinstein ideas, the phylogenetic KR metric can be written in a closed form and calculated in linear time. Using Monte Carlo resampling of the data, we assign a statistical significance level to the observed distance between two distributions under a null hypothesis of no clustering. We also approximate the significance level using a functional of a suitable Gaussian process; in the $L^2$ generalized case this functional is distributed as a linear combination of $\chi_1^2$ random variables weighted by the eigenvalues of an associated matrix. We conclude with an example application using our software implementation of the KR metric and its generalizations.

Details

Files

Statistics

from
to
Export
Download Full History