Search:
Computing and Library Services - delivering an inspiring information environment

VCC: A framework for building containerized reproducible cluster software environments

Higgins, Joshua, Holmes, Violeta and Venters, Colin (2017) VCC: A framework for building containerized reproducible cluster software environments. The Journal of Open Source Software, 2 (11). ISSN 2475-9066

[img]
Preview
PDF - Accepted Version
Available under License Creative Commons Attribution.

Download (170kB) | Preview
[img] Archive (ZIP) (Source code (MIT license)) - Other
Download (133kB)

Abstract

The problem of portability and reproducibility of the software used to conduct computational experiments has recently come to the fore. Container virtualisation has proved to be a powerful tool to achieve portability of a code and it's execution environment, through runtimes such as Docker, LXC, Singularity and others - without the performance cost of traditional Virtual Machines (Chamberlain, Invenshure, and Schommer 2014; Felter et al. 2014).

However, scientific software often depends on a system foundation that provides middleware, libraries, and other supporting software in order for the code to execute as intended. Typically, container virtualisation addresses only the portability of the code itself, which does not make it inherently reproducible. For example, a containerized MPI application may offer binary compatibility between different systems, but for execution as intended, it must be run on an existing cluster that provides the correct interfaces for parallel MPI execution.

As a greater demand to accomodate a diverse range of disciplines is placed on high performance and cluster resources, the ability to quickly create and teardown reproducible, transitory virtual environments that are tailored for an individual task or experiment will be essential.

The Virtual Container Cluster (VCC) is a framework for building containers that achieve this goal, by encapsulating a parallel application along with an execution model, through a set of dependency linked services and built-in process orchestration. This promotes a high degree of portability, and offers easier reproducibility by shipping the application along with the foundation required to execute it - whether that be an MPI cluster, big data processing framework, bioinformatics pipeline, or any other execution model (Higgins, Holmes, and Venters 2017).

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Schools: School of Computing and Engineering > High-Performance Intelligent Computing > High Performance Computing Research Group
School of Computing and Engineering
Related URLs:
Depositing User: Joshua Higgins
Date Deposited: 23 Mar 2017 14:25
Last Modified: 23 Mar 2017 14:28
URI: http://eprints.hud.ac.uk/id/eprint/31623

Downloads

Downloads per month over past year

Repository Staff Only: item control page

View Item View Item

University of Huddersfield, Queensgate, Huddersfield, HD1 3DH Copyright and Disclaimer All rights reserved ©