- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The detection and testing of multivariate outliers
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
The detection and testing of multivariate outliers White, Richard Alan
Abstract
The classical estimators of multivariate location and scatter for the normal model are the sample mean and sample covariance. However, if outliers are present in the data, the classical estimates can be very inaccurate and robust estimates should be used in their place. Most multivariate robust estimators are very difficult if not impossible to compute, thus limiting their use. I will present some simple approximations that make these estimators computable. Robust estimation down weighs or completely ignores the outliers. This is not always best because the outliers can contain some very important information about the population. If they can be identified, the outliers can be further investigated and an appropriate action can be taken based on the results. To detect outliers, a sequential multivariate scale-ratio test is proposed. It is based on a non-robust estimate and a robust estimate of scatter and is applied in a forward fashion, removing the most extreme point at each step, until the test fails to indicate the presence of outliers. We will show that this procedure has level c when applied to an uncontaminated sample, is uneffected by swamping or masking and is accurate in detecting outliers. Finally, we will apply the scale-ratio test to several data sets and compare it to the sequential Wilk’s outlier test as proposed by C. Caroni and P. Prescott in 1992.
Item Metadata
Title |
The detection and testing of multivariate outliers
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
1992
|
Description |
The classical estimators of multivariate location and scatter for the normal
model are the sample mean and sample covariance. However, if outliers are present
in the data, the classical estimates can be very inaccurate and robust estimates
should be used in their place. Most multivariate robust estimators are very difficult
if not impossible to compute, thus limiting their use. I will present some simple
approximations that make these estimators computable.
Robust estimation down weighs or completely ignores the outliers. This is not
always best because the outliers can contain some very important information
about the population. If they can be identified, the outliers can be further investigated
and an appropriate action can be taken based on the results. To detect
outliers, a sequential multivariate scale-ratio test is proposed. It is based on a
non-robust estimate and a robust estimate of scatter and is applied in a forward
fashion, removing the most extreme point at each step, until the test fails to indicate
the presence of outliers. We will show that this procedure has level c when
applied to an uncontaminated sample, is uneffected by swamping or masking and
is accurate in detecting outliers. Finally, we will apply the scale-ratio test to several
data sets and compare it to the sequential Wilk’s outlier test as proposed by
C. Caroni and P. Prescott in 1992.
|
Extent |
1158229 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2008-12-16
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0086547
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
1992-11
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.