NORMA eResearch @NCI Library

Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud

Campa, Sonia, Danelutto, Marco, Goli, Mehdi, González-Vélez, Horacio, Popescu, Alina Madalina and Torquati, Massimo (2014) Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud. Future Generation Computer Systems, 37. pp. 354-366. ISSN 0167-769X

Full text not available from this repository.
Official URL:


The widespread adoption of traditional heterogeneous systems has substantially improved the computing power available and, in the meantime, raised optimisation issues related to the processing of task streams across both CPU and GPU cores in heterogeneous systems. Similar to the heterogeneous improvement gained in traditional systems, cloud computing has started to add heterogeneity support, typically through GPU instances, to the conventional CPU-based cloud resources. This optimisation of cloud resources will arguably have a real impact when running on-demand computationally-intensive applications.

In this work, we investigate the scaling of pattern-based parallel applications from physical, “local” mixed CPU/GPU clusters to a public cloud CPU/GPU infrastructure. Specifically, such parallel patterns are deployed via algorithmic skeletons to exploit a peculiar parallel behaviour while hiding implementation details.

We propose a systematic methodology to exploit approximated analytical performance/cost models, and an integrated programming framework that is suitable for targeting both local and remote resources to support the offloading of computations from structured parallel applications to heterogeneous cloud resources, such that performance values not available on local resources may be actually achieved with the remote resources. The amount of remote resources necessary to achieve a given performance target is calculated through the performance models in order to allow any user to hire the amount of cloud resources needed to achieve a given target performance value. Thus, it is therefore expected that such models can be used to devise the optimal proportion of computations to be allocated on different remote nodes for Big Data computations.

We present different experiments run with a proof of-concept implementation based on FastFlowon small departmental clusters as well as on a public cloud infrastructure with CPU and GPU using the Amazon Elastic Compute Cloud. In particular, we show how CPU-only and mixed CPU/GPU computations can be offloaded to remote cloud resources with predictable performances and how data intensive applications can be mapped to a mix of local and remote resources to guarantee optimal performances.

Item Type: Article
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Divisions: School of Computing > Staff Research and Publications
Depositing User: Caoimhe Ní Mhaicín
Date Deposited: 03 Mar 2014 14:23
Last Modified: 17 Jun 2014 09:55

Actions (login required)

View Item View Item