Performance of load balancing techniques for join operations in shared-nothing database management systems
Abbreviated Journal Title
J. Parallel Distrib. Comput.
parallel database system; join operation; performance evaluation; load; balancing; sampling; RELATIONAL DATABASES; MULTIPROCESSOR; Computer Science, Theory & Methods
We investigate various load balancing approaches for hash-based join techniques popular in multicomputer-based shared-nothing database systems. When the tuples are not uniformly distributed among the hash buckets, redistribution of these buckets among the processors is necessary to maintain good system performance. Two recent load balancing techniques which rely on sampling and incremental balancing, respectively, have been shown to be more robust than conventional methods. The comparison of these two approaches, however, has not been investigated. In this study, we improve these two schemes and implement them along with a conventional method and a standard join technique which does not do load balancing on an nCUBE/2 parallel computer to compare their performance. Our experimental results indicate that the sampling technique is the better approach. To further evaluate the performance of these techniques under diverse hardware conditions, we also develop a cost model and implement a simulator to perform sensitivity analyses with respect to various hardware parameters. The simulation results show that both sampling and incremental techniques provide noticeable savings over conventional methods, with the sampling approach being more scalable in supporting very large database systems. (C) 1999 Academic Press.
Journal of Parallel and Distributed Computing
"Performance of load balancing techniques for join operations in shared-nothing database management systems" (1999). Faculty Bibliography 1990s. 2676.