UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

PSS : a phonetic search system for short text documents Zhang, Jerry Jiaer

Abstract

Finding the right information from the increasing amount of data on the Internet is not easy. This is why most people use search engines because they make searching less difficult with a a variety of techniques. In this thesis, we address one of them called phonetic matching. The idea is to look for documents in a document set based on not only the spellings but their pronunciations as well. It is useful when a query contains spelling mistakes or a correctly spelled one does not return enough results. In these cases, phonetic matching can fix or tune up the original query by replacing some or all query words with the new ones that are phonetically similar, and hopefully achieve more hits. We propose the design of such a search system for short text documents. It allows for single- and multiple-word queries to be matched to sound-like words or phrases contained in a document set and sort the results in terms of their relevance to the original queries. Our design differs from many existing systems in that, instead of relying heavily on a set of extensive prior user query logs, our system makes search decisions mostly based on a relatively small dictionary consisting of organized metadata. Our goal is to make it suitable for start-up document sets to have the comparable phonetic search ability as those of bigger databases, without having to wait till enough historical user queries are accumulated.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International