Programming Many-Core Chips by András Vajda

Programming Many-Core Chips by András Vajda

Author:András Vajda
Language: eng
Format: epub
Publisher: Springer US, Boston, MA


5.5.4 Parallel Execution Patterns

The lowest layer in the pattern hierarchy of OPL consists of patterns that are usually input to the implementation of language run-time systems, middleware solutions and occasionally operating systems. These five patterns represent recipes for using the basic concepts offered by the HW and the OS (processes, threads, synchronization and communication primitives) in order to support implementation of the patterns from the higher level layers.

Multiple Instruction, Multiple Data (MIMD) captures the wide-spread paradigm of executing different programs working on different data sets on different processor cores; these streams of computations will occasionally synchronize through e.g. message passing or shared memory. This pattern is the natural choice for several types of applications.

Single Instruction, Multiple Data (SIMD) captures a special class of processing environments, commonly found in Graphics Processing Units (GPUs): all the cores (in practice, a subset of cores) execute in lock step the same instruction flow, but act on different data sets. Such execution environments are best suited for problem domains were data parallelism is the natural choice.

Thread pool is a method for guaranteeing fast allocation of threads to fulfill the application’s needs. Threads are pre-allocated and managed in an idle thread pool; whenever the application needs a new thread of execution, one of the pre-allocated threads is woken up and given the task indicated by the application; when the task is completed, the thread is returned to the pool. This pattern is often used with dynamic functional decomposition methods and it’s at the basis of task based models. The same effect can be obtained—and hence we consider it an implementation of this pattern—through specific language run-time systems that can quickly create and destroy user-space threads, without actually maintaining a thread pool; the Erlang run-time system is a prime example, with thread creation times far lower than through any OS level primitive.

The Task Graph pattern captures the mechanism through which task dependencies can be expressed as a directed acyclic graph and presented to the run-time system for scheduling and execution on a machine with multiple processor cores. Such mechanisms are typical to data-flow type of applications found in e.g. the signal processing parts of mobile communication systems.

Transactions are the prime mechanism for implementing speculative execution. The run-time system has to provide and support the mechanism for implementing units of execution (chunks of programs and memory these are accessing) that either complete without conflicts with other units of execution or need to be rolled back and re-executed at a later time. It is sometimes implemented using transactional memory as a vehicle for detecting and rolling back conflicting memory operations.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.