Go to  Advanced Search

Robustification of the sparse K-means clustering algorithm

Show full item record

Files in this item

Files Size Format Description   View
ubc_2011_fall_kondo_yumi.pdf 4.203Mb Adobe Portable Document Format   View/Open
Title: Robustification of the sparse K-means clustering algorithm
Author: Kondo, Yumi
Degree Master of Science - MSc
Program Statistics
Copyright Date: 2011
Publicly Available in cIRcle 2011-09-01
Abstract: Searching a dataset for the ‘‘natural grouping / clustering’’ is an important explanatory technique for understanding complex multivariate datasets. One might expect that the true underlying clusters present in a dataset differ only with respect to a small fraction of the features. Furthermore, one might afraid that the dataset might contain potential outliers. Through simulation studies, we find that an existing sparse clustering method can be severely affected by a single outlier. In this thesis, we develop a robust clustering method that is also able to perform variable selection: we robustified sparse K-means (Witten and Tibshirani [28]), based on the idea of trimmed K-means introduced by Gordaliza [7] and Gordaliza [8]. Since high dimensional datasets often contain quite a few missing observations, we made our proposed method capable of handling datasets with missing values. The performance of the proposed robust sparse K-means is assessed in various simulation studies and two data analyses. The simulation studies show that robust sparse K-means performs better than other competing algorithms in terms of both the selection of features and the selection of a partition when datasets are contaminated. The analysis of a microarray dataset shows that robust sparse K-means best reflects the oestrogen receptor status of the patients among all other competing algorithms. We also adapt Clest (Duboit and Fridlyand [5]) to our robust sparse K-means to provide an automatic robust procedure of selecting the number of clusters. Our proposed methods are implemented in the R package RSKC.
URI: http://hdl.handle.net/2429/37093
Scholarly Level: Graduate

This item appears in the following Collection(s)

Show full item record

All items in cIRcle are protected by copyright, with all rights reserved.

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893