UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Multi-way hash join effectiveness Henderson, Michael

Abstract

In database systems most join algorithms are binary and will only operate on two inputs at a time. In order to join more than two input relations a database system will use the results of a binary join of two of the inputs in a second join. This way any number of input relations can be combined into a single output. There is additional cost to having multiple joins as the results of each intermediate join must be cached and processed. Recent research into joins on more than two inputs, called multi-way joins, has shown that the intermediate partitioning steps of a traditional hash join based query plan can be avoided. This decreases the amount of disk based input and output (I/Os) that the join query will require which is desirable since disk I/O is one of the slowest parts of a join. This thesis studies the advantages and disadvantages of implementing and using different multi-way join algorithms and their relative performance compared to traditional hash joins. Specifically, this work compares dynamic hash join with three multi-way join algorithms, Hash Teams, Generalized Hash Teams and SHARP. The results of the experiments show that in some limited cases these multi-way hash joins can provide a significant advantage over the traditional hash join but in many cases they can perform worse. Since the cases where these multi-way joins have better performance is so limited and their algorithms are much more complex, it does not make sense to implement Hash Teams or Generalized Hash Teams in production database management systems. SHARP provides enough of a performance advantage that it makes sense to implement it in a database system used for data warehousing.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International