UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

HMM converter a tool box for hidden Markov models with two novel, memory efficient parameter training algorithms Lam, Tin Yin

Abstract

Hidden Markov models (HMMs) are powerful statistical tools for biological sequence analysis. Many recently developed Bioinformatics applications employ variants of HMMs to analyze diverse types of biological data. It is typically fairly easy to design the states and the topological structure of an HMM. However, it can be difficult to estimate parameter values which yield a good prediction performance. As many HMM-based applications employ similar algorithms for generating predictions, it is also time-consuming and error-prone to have to re-implement these algorithms whenever a new HMM-based application is to be designed. This thesis addresses these challenges by introducing a tool-box, called HMMC0NvERTER, which only requires an XML-input file to define an HMM and to use it for sequence decoding and parameter training. The package not only allows for rapid proto-typing of HMM-based applications, but also incorporates several algorithms for sequence decoding and parameter training, including two new, linear memory algorithms for parameter training. Using this software package, even users without programming knowledge can quickly set up sophisticated HMMs and pair-HMMs and use them with efficient algorithms for parameter training and sequence analyses. We use HMMCONVERTER to construct a new comparative gene prediction program, called ANNOTAID, which can predict pairs of orthologous genes by integrating prior information about each input sequence probabilistically into the gene prediction process and into parameter training. ANNOTAID can thus be readily adapted to predict orthologous gene pairs in newly sequenced genomes.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International