UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Schema labelling applied to hand-printed Chinese character recognition Bult, Timothy Paul

Abstract

Hand-printed Chinese character recognition presents an interesting problem for Artificial Intelligence research. Input data in the form of arrays of pixel values cannot be directly mapped to unique character identifications because of the complexity of the characters. Thus, intermediate data structures are necessary, which in turn lead to a need to represent knowledge of the characters' composition. Building the intermediate constructs for these hand-printed characters necessarily involves choices among ambiguities, the set of which is so large that an efficient search algorithm becomes central to the recognition process. Schema labelling is a theory of how knowledge should be organized for recognition tasks in which composition structure is inherent in the domain, the composition entails ambiguity, and the ambiguity generates large search spaces. This thesis describes an implementation of an enhanced version of schema labelling for Chinese characters. The specific problems addressed by the enhancements, with some success, are (i) the segmentation of real images into objects usable by the schema system, (ii) the definition of schemas which adequately describe the generic composition of hand-printed Chinese characters, as well as common variations or vagaries, and (iii) the inclusion of sufficient "control knowledge" to prevent combinatorial explosion of the backtracking recognition process. Test characters for recognition systems can be classified along several dimensions. On the spectrum from type-set, through hand-printed, to hand-written forms, our system was tested on restricted hand-print, at a level somewhat more difficult than is normally attempted. On the spectrum of input types, from grey-scale pixel input through on-line stroke representations, our system was fully tested only at the high end, with complete synthetic strokes. We obtained a success rate of 57%, 12 out of the 21 characters tested. The principal success of the work is that characters of the complexity tested could be recognized at all, and in the impact schema labelling techniques had on that recognition.

Item Media

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.