Go to  Advanced Search

Statistical methods for high throughput genomics

Show full item record

Files in this item

Files Size Format Description   View
ubc_2009_fall_lo_chiho.pdf 9.508Mb Adobe Portable Document Format   View/Open
 
Title: Statistical methods for high throughput genomics
Author: Lo, Chi Ho
Degree Doctor of Philosophy - PhD
Program Statistics
Copyright Date: 2009
Publicly Available in cIRcle 2009-10-08
Abstract: The advancement of biotechnologies has led to indispensable high-throughput techniques for biological and medical research. Microarray is applied to monitor the expression levels of thousands of genes simultaneously, while flow cytometry (FCM) offers rapid quantification of multi-parametric properties for millions of cells. In this thesis, we develop approaches based on mixture modeling to deal with the statistical issues arising from both high-throughput biological data sources. Inference about differential expression is a typical objective in analysis of gene expression data. The use of Bayesian hierarchical gamma-gamma and lognormal-normal models is popular for this type of problem. Some unrealistic assumptions, however, have been made in these frameworks. In view of this, we propose flexible forms of mixture models based on an empirical Bayes approach to extend both frameworks so as to release the unrealistic assumptions, and develop EM-type algorithms for parameter estimation. The extended frameworks have been shown to significantly reduce the false positive rate whilst maintaining a high sensitivity, and are more robust to model misspecification. FCM analysis currently relies on the sequential application of a series of manually defined 1D or 2D data filters to identify cell populations of interest. This process is time-consuming and ignores the high-dimensionality of FCM data. We reframe this as a clustering problem, and propose a robust model-based clustering approach based on t mixture models with the Box-Cox transformation for identifying cell populations. We describe an EM algorithm to simultaneously handle parameter estimation along with transformation selection and outlier identification, issues of mutual influence. Empirical studies have shown that this approach is well adapted to FCM data, in which a high abundance of outliers and asymmetric cell populations are frequently observed. Finally, in recognition of concern for an efficient automated FCM analysis platform, we have developed an R package called flowClust to automate the gating analysis with the proposed methodology. Focus during package development has been put on the computational efficiency and convenience of use at users' end. The package offers a wealth of tools to summarize and visualize features of the clustering results, and is well integrated with other FCM packages.
URI: http://hdl.handle.net/2429/13762

This item appears in the following Collection(s)

Show full item record

All items in cIRcle are protected by copyright, with all rights reserved.

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893