Multi-way hash join effectiveness

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Multi-way hash join effectiveness Henderson, Michael

Abstract

In database systems most join algorithms are binary and will only operate on two inputs at a time. In order to join more than two input relations a database system will use the results of a binary join of two of the inputs in a second join. This way any number of input relations can be combined into a single output. There is additional cost to having multiple joins as the results of each intermediate join must be cached and processed. Recent research into joins on more than two inputs, called multi-way joins, has shown that the intermediate partitioning steps of a traditional hash join based query plan can be avoided. This decreases the amount of disk based input and output (I/Os) that the join query will require which is desirable since disk I/O is one of the slowest parts of a join. This thesis studies the advantages and disadvantages of implementing and using different multi-way join algorithms and their relative performance compared to traditional hash joins. Specifically, this work compares dynamic hash join with three multi-way join algorithms, Hash Teams, Generalized Hash Teams and SHARP. The results of the experiments show that in some limited cases these multi-way hash joins can provide a significant advantage over the traditional hash join but in many cases they can perform worse. Since the cases where these multi-way joins have better performance is so limited and their algorithms are much more complex, it does not make sense to implement Hash Teams or Generalized Hash Teams in production database management systems. SHARP provides enough of a performance advantage that it makes sense to implement it in a database system used for data warehousing.

Item Metadata

Title	Multi-way hash join effectiveness
Creator	Henderson, Michael
Publisher	University of British Columbia
Date Issued	2013
Description	In database systems most join algorithms are binary and will only operate on two inputs at a time. In order to join more than two input relations a database system will use the results of a binary join of two of the inputs in a second join. This way any number of input relations can be combined into a single output. There is additional cost to having multiple joins as the results of each intermediate join must be cached and processed. Recent research into joins on more than two inputs, called multi-way joins, has shown that the intermediate partitioning steps of a traditional hash join based query plan can be avoided. This decreases the amount of disk based input and output (I/Os) that the join query will require which is desirable since disk I/O is one of the slowest parts of a join. This thesis studies the advantages and disadvantages of implementing and using different multi-way join algorithms and their relative performance compared to traditional hash joins. Specifically, this work compares dynamic hash join with three multi-way join algorithms, Hash Teams, Generalized Hash Teams and SHARP. The results of the experiments show that in some limited cases these multi-way hash joins can provide a significant advantage over the traditional hash join but in many cases they can perform worse. Since the cases where these multi-way joins have better performance is so limited and their algorithms are much more complex, it does not make sense to implement Hash Teams or Generalized Hash Teams in production database management systems. SHARP provides enough of a performance advantage that it makes sense to implement it in a database system used for data warehousing.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2013-07-20
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0073975
URI	http://hdl.handle.net/2429/44680
Degree	Master of Science - MSc
Program	Interdisciplinary Studies
Affiliation	Graduate Studies, College of (Okanagan)
Degree Grantor	University of British Columbia
Graduation Date	2013-11
Campus	UBCO
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Multi-way hash join effectiveness Henderson, Michael

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights