Go to  Advanced Search

Robust estimation of multivariate scatter in non-affine equivariant scenarios

Show full item record

Files in this item

Files Size Format Description   View
ubc_2010_spring_danilov_mikhail.pdf 1.060Mb Adobe Portable Document Format   View/Open
 
Title: Robust estimation of multivariate scatter in non-affine equivariant scenarios
Author: Danilov, Mikhail
Degree Doctor of Philosophy - PhD
Program Statistics
Copyright Date: 2010
Publicly Available in cIRcle 2010-02-01
Abstract: We consider the problem of robust estimation of the scatter matrix of an elliptical distribution when observed data are corrupted in a cell-wise manner. The first half of the thesis develops a framework for dealing with data subjected to independent cell-wise contamination. Each data cell (as opposed to data case in traditional robustness) can be contaminated independently of the rest of the case. Instead of downweighting the whole case we attempt to identify the affected cells, remove the offending values and treat them as missing at random for subsequent likelihood-based processing. We explore several variations of the detection procedure that takes into account the multivariate structure of the data and end up with a heuristic algorithm that identifies and removes a large proportion of dangerous independent contamination. Although there are not many existing methods to measure against, the proposed covariance estimate compares favorably to naive alternatives such as pairwise estimates or univariate Winsorising. The cell-wise data corruption mechanism that we deal with in the second half of this thesis is missing data. Missing data on their own have been well studied and likelihood methods are well developed. The new setting that we are interested in is when missing data come together with the traditional case-wise contamination. Both issues have been studied extensively over that last few decades but little attention has been paid to how to address them both at the same time. We propose a modification of the S-estimate that allows robust estimation of multivariate location and scatter matrix in the presence of missing completely at random (MCAR) data. The method is based on the idea of the maximum likelihood of the observed data and extends it into the world of S-estimates. The estimate comes complete with the computation algorithm, which is an adjusted version of the widely used Fast-S procedure. Simulation results and applications to real datasets confirm the superiority of our method over available alternatives. Preliminary investigation reported in the concluding chapter suggests that combining the two main ideas presented in this thesis can yield an estimate that is robust against case-wise and cell-wise contamination simultaneously.
URI: http://hdl.handle.net/2429/19462

This item appears in the following Collection(s)

Show full item record

All items in cIRcle are protected by copyright, with all rights reserved.

Attribution-NonCommercial 2.5 Canada Except where otherwise noted, this item's license is described as Attribution-NonCommercial 2.5 Canada

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893