Go to  Advanced Search

Robustification of the sparse K-means clustering algorithm

Show simple item record

dc.contributor.author Kondo, Yumi
dc.date.accessioned 2011-09-01T20:59:55Z
dc.date.available 2011-09-01T20:59:55Z
dc.date.copyright 2011 en
dc.date.issued 2011-09-01
dc.identifier.uri http://hdl.handle.net/2429/37093
dc.description.abstract Searching a dataset for the ‘‘natural grouping / clustering’’ is an important explanatory technique for understanding complex multivariate datasets. One might expect that the true underlying clusters present in a dataset differ only with respect to a small fraction of the features. Furthermore, one might afraid that the dataset might contain potential outliers. Through simulation studies, we find that an existing sparse clustering method can be severely affected by a single outlier. In this thesis, we develop a robust clustering method that is also able to perform variable selection: we robustified sparse K-means (Witten and Tibshirani [28]), based on the idea of trimmed K-means introduced by Gordaliza [7] and Gordaliza [8]. Since high dimensional datasets often contain quite a few missing observations, we made our proposed method capable of handling datasets with missing values. The performance of the proposed robust sparse K-means is assessed in various simulation studies and two data analyses. The simulation studies show that robust sparse K-means performs better than other competing algorithms in terms of both the selection of features and the selection of a partition when datasets are contaminated. The analysis of a microarray dataset shows that robust sparse K-means best reflects the oestrogen receptor status of the patients among all other competing algorithms. We also adapt Clest (Duboit and Fridlyand [5]) to our robust sparse K-means to provide an automatic robust procedure of selecting the number of clusters. Our proposed methods are implemented in the R package RSKC. en
dc.language.iso eng en
dc.publisher University of British Columbia en
dc.title Robustification of the sparse K-means clustering algorithm en
dc.type Electronic Thesis or Dissertation en
dc.degree.name Master of Science - MSc en
dc.degree.discipline Statistics en
dc.degree.grantor University of British Columbia en
dc.date.graduation 2011-11 en
dc.degree.campus UBCV en
dc.description.scholarlevel Graduate en

Files in this item

Files Size Format Description   View
ubc_2011_fall_kondo_yumi.pdf 4.203Mb Adobe Portable Document Format   View/Open

This item appears in the following Collection(s)

Show simple item record

All items in cIRcle are protected by copyright, with all rights reserved.

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893