Go to  Advanced Search

Please note that cIRcle is currently being upgraded to DSpace v5.1. The upgrade means that the cIRcle service will *not* be accepting new submissions from 5:00 PM on September 1, 2015 until 5:00 PM on September 4, 2015. All cIRcle material will still be accessible during this period. Apologies for any inconvenience. [CYPRESS]

Robust genotype classification using dynamic variable selection

Show full item record

Files in this item

Files Size Format Description   View
ubc_2008_fall_podder_mohua.pdf 2.348Mb Adobe Portable Document Format   View/Open
 
Title: Robust genotype classification using dynamic variable selection
Author: Podder, Mohua
Degree: Doctor of Philosophy - PhD
Program: Statistics
Copyright Date: 2008
Subject Keywords Statistics;Bioinformatics;Microarray analysis
Issue Date: 2008-09-02
Publisher University of British Columbia
Abstract: Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide –A, T, C or G – is altered. Arguably, SNPs account for more than 90% of human genetic variation. Dr. Tebbutt's laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). The strength of this platform is its unique redundancy having multiple probes for a single SNP. Using this microarray platform, we have developed fully-automated genotype calling algorithms based on linear models for individual probe signals and using dynamic variable selection at the prediction level. The algorithms combine separate analyses based on the multiple probe sets to give a final confidence score for each candidate genotypes. Our proposed classification model achieved an accuracy level of >99.4% with 100% call rate for the SNP genotype data which is comparable with existing genotyping technologies. We discussed the appropriateness of the proposed model related to other existing high-throughput genotype calling algorithms. In this thesis we have explored three new ideas for classification with high dimensional data: (1) ensembles of various sets of predictors with built-in dynamic property; (2) robust classification at the prediction level; and (3) a proper confidence measure for dealing with failed predictor(s). We found that a mixture model for classification provides robustness against outlying values of the explanatory variables. Furthermore, the algorithm chooses among different sets of explanatory variables in a dynamic way, prediction by prediction. We analyzed several data sets, including real and simulated samples to illustrate these features. Our model-based genotype calling algorithm captures the redundancy in the system considering all the underlying probe features of a particular SNP, automatically down-weighting any ‘bad data’ corresponding to image artifacts on the microarray slide or failure of a specific chemistry. Though motivated by this genotyping application, the proposed methodology would apply to other classification problems where the explanatory variables fall naturally into groups or outliers in the explanatory variables require variable selection at the prediction stage for robustness.
Affiliation: Science, Faculty of
URI: http://hdl.handle.net/2429/1602
Scholarly Level: Graduate

This item appears in the following Collection(s)

Show full item record

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893