UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Computational exploratory analysis of high-dimensional Flow Cytometry data for diagnosis and biomarker discovery Aghaeepour, Nima

Abstract

Flow Cytometry (FCM) is widely used to investigate and diagnose human disease. Although high-throughput systems allow rapid data collection from large cohorts, manual data analysis can take months. Moreover, identification of cell populations can be subjective, and analysts rarely examine the entirety of the multidimensional dataset (focusing instead on a limited number of subsets, the biology of which has usually already been well-described). Thus, the value of Polychromatic Flow Cytometry (PFC) as a discovery tool is largely wasted. In this thesis, I will present three computational tools that once merged together provide a complete pipeline for analysis and visualization of FCM data: (1) a clustering algorithm for identification of homogeneous groups of cells (cell populations); (2) a set of statistical tools for identifying immunophenotypes (based on the cell populations) that are correlated with an external variable (e.g., a clinical outcome); (3) a tool for identifying the most important parent populations that can best describe a set of related immunophenotypes. In addition to technical advancements, this pipeline represents a conceptual advance that allows a more powerful, automated, and complete analysis of complex flow cytometry data than previously possible. As a side product, this pipeline allows complex information from PFC studies to be translated into clinical or resource-poor settings, where multiparametric analysis is less feasible. I demonstrated the utility of this approach in a large (n = 466), retrospective, 14-parameter PFC study of early HIV infection, where we identified three T-cell subsets that strongly predicted progression to AIDS (only one of which was identified by an initial manual analysis). Before and during the development of this pipeline, a wide range of computational tools for analysis of FCM data were published. However, guidance for end users about appropriate use and application of these methods is scarce. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) is a highly collaborative project for evaluation of these computational tools using real-world datasets. The FlowCAP results presented here will help both computational and biological scientists to better develop and use advanced bioinformatics pipelines.

Item Citations and Data

Rights

Attribution-NonCommercial-ShareAlike 3.0 Unported