In this work, we describe several probabilistic models designed to attack the main phylogenetic problems (tree inference, ancestral sequence reconstruction, and multiple sequence alignment). For each model, we discussing the issues of representation, inference, analysis and empirical evaluation.
Among the contributions, we propose the first computational approach to diachronic phonology scalable to large scale phylogenies. Sound changes and markedness are taken into account using a flexible feature-based unsupervised learning framework. Using this model, we attacked a 50-year-old open problem in linguistics regarding the role of functional load in language change. We also introduce three novel algorithms for inferring multiple sequence alignments, and a stochastic process allowing joint, accurate and efficient inference of phylogenetic trees and multiple sequence alignments.
Finally, many of the tools developed to do inference over these models are applicable more broadly, creating a transfer of idea from phylogenetics into machine learning as well. In particular, the variational framework used for multiple sequence alignment extends to a broad class of combinatorial inference problems.