The problem of the inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. In this paper we present a novel statistical model for haplotype inference. Our model is a Bayesian model based on a prior known as the Dirichlet process, a nonparametric prior which provides control over the size of the unknown pool of population haplotypes. The model also incorporates a likelihood that allows statistical errors in the haplotype/genotype relationship, trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference. The overall result is a flexible Bayesian model that is reminiscent of parsimony methods in its preference for small haplotype pools. We apply this new approach to the analysis of both simulated and real genotype data, and compare to extant methods.




Download Full History