Search:
Computing and Library Services - delivering an inspiring information environment

Design and Evaluation of Small-Large Outer Joins in Cloud Computing Environments

Cheng, Long, Tachmazidis, Ilias, Kotoulas, Spyros and Antoniou, Grigoris (2017) Design and Evaluation of Small-Large Outer Joins in Cloud Computing Environments. Journal of Parallel and Distributed Computing. ISSN 0743-7315

[img] PDF - Accepted Version
Restricted to Repository staff only until 6 March 2019.

Download (890kB)

Abstract

Large-scale analytics is a key application area for data processing and parallel computing research. One of the most common (and challenging) operations in this domain is the join. Though inner join approaches have been extensively evaluated in parallel and distributed systems, there is little published work providing analysis of outer joins, especially in the extremely popular cloud computing environments. A common type of outer join is the small-large outer join, where one relation is relatively small and the other is large. Conventional implementations on this condition, such as one based on hash redistribution, often incur significant network communication, while the duplication-based approaches are complex and inefficient. In this work, we present a new method called DDR (duplication and direct redistribution), which aims to enable efficient small-large outer joins in cloud computing environments while being easy to implement using existing predicates in data processing frameworks. We present the detailed implementation of our approach and evaluate its performance through extensive experiments over the widely used MapReduce and Spark platforms. We show that the proposed method is scalable and can achieve significant performance improvements over the conventional approaches. Compared to the state-of-art method, the DDR algorithm is shown to be easier to implement and can achieve very similar or better performance under different outer join workloads, and thus, can be considered as a new option for current data analysis applications. Moreover, our detailed experimental results also have provided insights of current small-large outer join implementations, thereby allowing system developers to make a more informed choice for their data analysis applications.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Schools: School of Computing and Engineering > High-Performance Intelligent Computing > Planning, Autonomy and Representation of Knowledge
School of Computing and Engineering > High-Performance Intelligent Computing > Planning, Autonomy and Representation of Knowledge

School of Computing and Engineering
Related URLs:
Depositing User: Ilias Tachmazidis
Date Deposited: 16 Mar 2017 15:35
Last Modified: 16 Mar 2017 19:28
URI: http://eprints.hud.ac.uk/id/eprint/31274

Downloads

Downloads per month over past year

Repository Staff Only: item control page

View Item View Item

University of Huddersfield, Queensgate, Huddersfield, HD1 3DH Copyright and Disclaimer All rights reserved ©