Go to  Advanced Search

Comparison of data classification procedures in applied geochemistry using Monte Carlo simulation

Show full item record

Files in this item

Files Size Format Description   View
UBC_1988_A1 S73.pdf 16.74Mb Adobe Portable Document Format   View/Open
Title: Comparison of data classification procedures in applied geochemistry using Monte Carlo simulation
Author: Stanley, Clifford R.
Degree: Doctor of Philosophy - PhD
Program: Geological Science
Copyright Date: 1988
Subject Keywords Geochemistry -- Methodology
Issue Date: 2010-10-21
Publisher University of British Columbia
Series/Report no. UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/]
Abstract: In geochemical applications, data classification commonly involves 'mapping' continuous variables into discrete descriptive categories, and often is achieved using thresholds to define specific ranges of data as separate groups which then can be compared with other categorical variables. This study compares several classification methods used in applied geochemistry to select thresholds and discriminate between populations or to recognize anomalous observations. The comparisons were made using monte carlo simulation to evaluate how well different techniques perform using different data set structures. A comparison of maximum likelihood parameter estimates of a mixture of normal distributions using class interval frequencies versus raw data was undertaken to study the quality of the corresponding results. The more time consuming raw data approach produces optimal parameter estimates while the more rapid class interval approach is the approach in common use. Results show that provided there are greater than 50 observations per distribution and (on average) 10 observations per class interval, the maximum likelihood parameter estimates by the two methods are practically indistinguishable. Univariate classification techniques evaluated in this study include the 'mean plus 2 standard deviations', the '95th percentile', the gap statistic and probability plots. Results show that the 'mean plus 2 standard deviations' and '95th percentile' approaches are inappropriate for most geochemical data sets. The probability plot technique classifies mixtures of normal distributions better than the gap statistic; however, the gap statistic may be used as a discordancy test to reveal the presence of outliers. Multivariate classification using the background characterization approach was simulated using several different functions to describe the variation in the background distribution. Comparisons of principal components, ordinary least squares regression and reduced major axis regression indicate that reduced major axis regression and principal components are not only consistent with assumptions about geochemical data, but are less sensitive to varying degrees of data set truncation than is ordinary least squares regression. Furthermore, correcting the descriptive statistics of a truncated data set and calculating the background functions using these statistics produces residuals and scores which are predictable and thus can be distinguished easily from residuals and scores calculated for data from another distribution.
Affiliation: Applied Science, Faculty of
URI: http://hdl.handle.net/2429/29430
Scholarly Level: Graduate

This item appears in the following Collection(s)

Show full item record

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893