UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Using lexical knowledge and parafoveal information for the recognition of common words and suffixes Rhone, Brock William

Abstract

Research over the past decade into the psychophysics of reading has demonstrated that information extracted from text falling on the parafoveal and peripheral regions of the retina is used by the human visual system to significantly increase reading speed. Recent results provide evidence that knowledge of word frequency is brought to bear in processing parafoveal data. There is other psychological evidence indicating the type of large-scale features used by the visual system to recognize isolated characters in parafoveal vision. This thesis describes the design and implementation of a system able to recognize the most commonly occurring english words and suffixes from parafoveally available information by employing knowledge of their letter sequences and of large-scale features of lower-case characters. The Marr-Hildreth theory of edge detection provides a description of the information computed by the earliest stages of visual processing from parafoveal words. Large-scale features extracted from this description, while relatively invariant with respect to noise and font changes, are insufficient to uniquely identify most characters but are used to place each into one of several classes of similar characters. The sequence of these 'confusion classes' is found to place a strong constraint on word identity—of the 1000 most common words comprising the system's vocabulary, representing 70% of the volume of the Brown Corpus of printed English, 92% have mutually unique confusion class sequences. Word recognition is achieved by using the confusion class sequence as a key into the vocabulary, retrieving the word or words having the same sequence. Suffixes are recognized in a similar way. Results are presented demonstrating the system's ability to identify words and suffixes in text images over a range of simulated parafoveal eccentricities and in two different fonts, one with serifs and one without. Smoothing by the Marr-Hildreth operator, the simplicity and scale of the features, the size of the character classes, and the context provided by the character sequence give the system a degree of robustness.

Item Media

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.