Languages and Compilers for Parallel Computing by Lawrence Rauchwerger
Author:Lawrence Rauchwerger
Language: eng
Format: epub, pdf
ISBN: 9783030352257
Publisher: Springer International Publishing
Keywords
Heterogeneous system-on-chipMemory synchronizationMemory concurrencyData sharing
Qualcomm Research is a division of Qualcomm Technologies, Inc.
1 Introduction
Heterogeneous computing systems allow programmers to match parts of an application to the strengths of the different devices available [14]. The ultimate goal of heterogeneous computing is to obtain higher performance at lower power by judiciously balancing the computation. Prior work has focused on partitioning applications across heterogeneous devices. For example, the Fast Multipole Method has been shown to work better on a CPU-GPU architecture [10]. However, heterogeneous systems are not limited to CPUs and GPUs. As we scale to a more diverse set of accelerators, a major impediment to programmers becomes moving data across devices. Mobile Systems-on-Chip (SoCs) typically share data across devices using a contiguously-allocated block of memory (i.e., a buffer) that is modifiable by one device at a time. Without advanced hardware support [2] synchronization is explicit, placing a significant burden on programmers. We propose to simplify data movement across devices via intuitive abstractions.
Not all devices share a common view of system memory or even have access to it. For example, a device may not be cache coherent, and some devices may address 32-bit memory while others address 64-bit. Our approach leverages the familiar notion of a buffer augmented with acquire-release semantics to deal with this non-uniformity. We abstract the diverse mechanisms for data access into a uniform set of synchronization primitives that is implemented on top of hardware support. Put simply, in acquire-release semantics either the entire buffer is made available to a kernel or none of it is. This provides programmers with a familiar technique for sharing data in a heterogeneous system.
Currently, many frameworks exist to enable offloading data to a device. For example, OpenCL, CUDA, or OpenGL are used to offload computation to the GPU and to share data between the CPU and GPU [7, 18, 22]. Android ION is another industry standard that allocates memory accessible by any ION-compliant devices on an SoC. It is often used to share data between compute devices and custom-components of the SoC, such as image-processing accelerators [11]. However, to efficiently use ION from OpenCL kernels (i.e., to avoid unnecessary copying of data from ION into GPU-accessible memory regions), programmers must use specialized extensions of OpenCL API calls (typically vendor specific). This results in multiple versions of the application code to support the range of platforms. Prior work focused on establishing a cache-coherent shared memory model over multiple devices does not capture this level of intricacies [6].
The challenge of managing multiple frameworks for offloading data gets more complex as more devices are added to heterogeneous systems. For example, FPGAs and ML accelerators [1, 17] will have their own mechanisms. Currently, a programmer looking to take advantage of all the compute devices available must: 1.Synchronize data across any combination of devices correctly and efficiently
Download
Languages and Compilers for Parallel Computing by Lawrence Rauchwerger.pdf
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8316)
Test-Driven Development with Java by Alan Mellor(6874)
Data Augmentation with Python by Duc Haba(6799)
Principles of Data Fabric by Sonia Mezzetta(6534)
Learn Blender Simulations the Right Way by Stephen Pearson(6440)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6297)
Hadoop in Practice by Alex Holmes(5969)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5817)
RPA Solution Architect's Handbook by Sachin Sahgal(5699)
Big Data Analysis with Python by Ivan Marin(5434)
The Infinite Retina by Robert Scoble Irena Cronin(5397)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5164)
Pretrain Vision and Large Language Models in Python by Emily Webber(4400)
Infrastructure as Code for Beginners by Russ McKendrick(4170)
Functional Programming in JavaScript by Mantyla Dan(4050)
The Age of Surveillance Capitalism by Shoshana Zuboff(3966)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3882)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3683)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3660)
