Go to  Advanced Search

Improving hash join performance by exploiting intrinsic data skew

Show full item record

Files in this item

Files Size Format Description   View
ubc_2009_spring_cutt_bryce.pdf 1.296Mb Adobe Portable Document Format   View/Open
 
Title: Improving hash join performance by exploiting intrinsic data skew
Author: Cutt, Bryce
Degree Master of Science - MSc
Program Interdisciplinary Studies
Copyright Date: 2009
Publicly Available in cIRcle 2009-06-17
Abstract: Large relational databases are a part of all of our lives. The government uses them and almost any store you visit uses them to help process your purchases. Real-world data sets are not uniformly distributed and often contain significant skew. Skew is present in commercial databases where, for example, some items are purchased far more often than others. A relational database must be able to efficiently find related information that it stores. In large databases the most common method used to find related information is a hash join algorithm. Although mitigating the negative effects of skew on hash joins has been studied, no prior work has examined how the statistics present in modern database systems can allow skew to be exploited and used as an advantage to improve the performance of hash joins. This thesis presents Histojoin: a join algorithm that uses statistics to identify data skew and improve the performance of hash join operations. Experimental results show that for skewed data sets Histojoin performs significantly fewer I/O operations and is faster by 10 to 60% than standard hash join algorithms.
URI: http://hdl.handle.net/2429/9375

This item appears in the following Collection(s)

Show full item record

All items in cIRcle are protected by copyright, with all rights reserved.

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893