UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Bayesian phylogenetic inference via Monte Carlo methods Wang, Liangliang

Abstract

A main task in evolutionary biology is phylogenetic tree reconstruction, which determines the ancestral relationships among di erent species based on observed molecular sequences, e.g. DNA data. When a stochastic model, typically continuous time Markov chain (CTMC), is used to describe the evolution, the phylogenetic inference depends on unknown evolutionary parameters (hyper-parameters) in the stochastic model. Bayesian inference provides a general framework for phylogenetic analysis, able to implement complex models of sequence evolution and to provide a coherent treatment of uncertainty for the groups on the tree. The conventional computational methods in Bayesian phylogenetics based on Markov chain Monte Carlo (MCMC) cannot e ciently explore the huge tree space, growing super exponentially with the number of molecular sequences, due to di culties of proposing tree topologies. sequential Monte Carlo (SMC) is an alternative to approximate posterior distributions. However, it is non-trivial to directly apply SMC to phylogenetic posterior tree inference because of its combinatorial intricacies. We propose the combinatorial sequential Monte Carlo (CSMC) method to generalize applications of SMC to non-clock tree inference based on the existence of a flexible partially ordered set (poset) structure, and we present it in a level of generality directly applicable to many other combinatorial spaces. We show that the proposed CSMC algorithm is consistent and fast in simulations. We also investigate two ways of combining SMC and MCMC to jointly estimate the phylogenetic trees and evolutionary parameters, particle Markov chain Monte Carlo (PMCMC) algorithms with CSMC at each iteration and an SMC sampler with MCMC moves. Further, we present a novel way to estimate the transition probabilities for a general CTMC, which can be used to solve the computing bottleneck in a general evolutionary model, string-valued continuous time Markov Chain (SCTMC), that can incorporate a wide range of molecular mechanisms.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 3.0 Unported