The Art of High Performance Computing for Computational Science, Vol. 1 by Masaaki Geshi
Author:Masaaki Geshi
Language: eng
Format: epub
ISBN: 9789811361944
Publisher: Springer Singapore
7.2.2 Hardware Characteristics of FLOPS-oriented Supercomputers
In the following, we outline the projected hardware specifications of exaFLOPS machines, assuming the FLOPS-oriented type, and point out their important features. A typical example of an exaFLOPS machine assumed in this chapter is shown in Fig. 7.1.
Fig. 7.1Architecture of an exaFLOPS machine presupposed in this chapter
Parallelism of order
At present, the clock frequency of CPU cores is at most a few GHz. This situation will not change drastically in the near future, mainly due to the requirement of keeping the power consumption at an affordable level. This means that we need order of parallelism to achieve exaFLOPS. This parallelism will be realized by a hierarchy, consisting of instruction level, core level, chip level, and node level parallelism.
Deep memory hierarchy
Today’s supercomputers already have a fairly deep memory hierarchy, consisting of on-chip registers, several levels of on-chip and off-chip cache, main memory within a node, and main memory in other nodes. This hierarchy will become even deeper and more complicated in exaFLOPS machines, corresponding to the hierarchical parallelism stated above.
Increase in the data transfer cost
Up to now, the floating-point performance of supercomputers has been increasing more rapidly than memory access performance or internode communication performance. This has resulted in a severe discrepancy between the computation speed and the data transfer speed, which is expected to grow even larger in the future. Let us divide the data transfer performance into latency and throughput and consider them separately. According to the prediction in [26], a FLOPS-oriented machine with total performance of 1,000–2,000 PFLOS will have a total memory bandwidth of 5–10 PBytes/s. Hence, the ratio of data transfer throughput to the floating-point performance is 0.005 Byte/FLOP. This means that we need to perform at least 1600 operations on each double-precision data (8 bytes) fetched from memory in order to fully exploit the machine’s floating-point performance. In contrast, for the K computer, the total performance is 10 PFLOPS and the total memory bandwidth is 5 PByte/s, so the ratio is 0.5 Byte/FLOP. Thus, the relative memory access cost of a FLOPS-oriented machine is 100 times higher than that of the K computer. As for the latency, [26, Table 2-3] estimates the inter-core synchronization/communication latency as 100 ns (100 cycle) and the internode communication latency as 80–200 ns. This means that virtually no performance enhancement can be expected with respect to the latency. Considering that exaFLOPS machines have much higher floating-point performance and larger number of nodes than today’s supercomputers, we can conclude that the effect of latency on execution time will be far more serious on exaFLOPS machines. The effect of latency will be the most salient in AllReduce-type communication such as arises in the inner product of two vectors. For example, an AllReduce operation among nodes using a binary tree will require several thousand cycles (Fig. 7.2).
Fig. 7.2AllReduce operation using a binary tree
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
AI & Machine Learning | Bioinformatics |
Computer Simulation | Cybernetics |
Human-Computer Interaction | Information Theory |
Robotics | Systems Analysis & Design |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(7836)
Hadoop in Practice by Alex Holmes(5650)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5496)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(4475)
Functional Programming in JavaScript by Mantyla Dan(3712)
The Age of Surveillance Capitalism by Shoshana Zuboff(3398)
Blockchain Basics by Daniel Drescher(2868)
Big Data Analysis with Python by Ivan Marin(2837)
The Rosie Effect by Graeme Simsion(2689)
WordPress Plugin Development Cookbook by Yannick Lefebvre(2524)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2460)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2444)
Dawn of the New Everything by Jaron Lanier(2423)
The Art Of Deception by Kevin Mitnick(2278)
Rapid Viz: A New Method for the Rapid Visualization of Ideas by Kurt Hanks & Larry Belliston(2176)
Human Dynamics Research in Smart and Connected Communities by Shih-Lung Shaw & Daniel Sui(2167)
Once Upon an Algorithm by Martin Erwig(2136)
Test-Driven Development with Java by Alan Mellor(2053)
Building Machine Learning Systems with Python by Richert Willi Coelho Luis Pedro(2048)