Chapter 1: System Models and Enabling Technologies (42 pages)
revised May 2, 2010
1.1 Scalable Computing Towards Massive Parallelism
Over the past 60 years, the state of computing has gone through a series of platform and
environmental changes. We review below the evolutional changes in machine architecture, operating system
platform, network connectivity, and application workloads. Instead of using a centralized computer to solve
computational problems, a parallel and distributed computing system uses multiple computers to solve
large-scale problems over the Internet. Distributed computing becomes data-intensive and network-centric.
We will identify the killer applications of modern systems that practice parallel and distributed computing.
These large-scale applications have significantly upgraded the quality of life in all aspects of our
1.1.1 High-Performance versus High-Throughput Computing
For a long time, high-performance computing (HPC) systems emphasizes the raw speed performance.
The speed of HPC systems increased from Gflops in the early 1990’s to now Pflops in 2010. This
improvement was driven mainly by demands from scientific, engineering, and manufacturing communities
in the past. The speed performance in term of floating-point computing capability on a single system is
facing some challenges by the business computing users. This flops speed measures the time to complete the
execution of a single large computing task, like the Linpack benchmark used in Top-500 ranking. In reality,
the number of users of the Top-500 HPC computers is rather limited to only 10% of all computer users.
Today, majority of computer users are still using desktop computers and servers either locally or in huge
datacenters, when they conduct Internet search and market-driven computing tasks.
The development of market-oriented high-end computing systems is facing a strategic change from the
HPC paradigm to a high-throughput computing (HTC) paradigm. This HTC paradigm pays more attention
to high-flux multi-computing. The main application of high-flux computing system lies in Internet searches
and web services by millions or more users simultaneously. The performance goal is thus shifted to measure
the high throughput or the number of tasks completed per unit of time. HTC technology needs to improve
not only high speed in batch processing, but also address the acute problem of cost, energy saving, security,
and reliability at many datacenters and enterprise computing centers. This book is designed to address both
HPC and HTC systems, that meet the demands of all computer users.
In the past, electronic computers have gone through five generations of development. Each generation
lasted 10 to 20 years. Adjacent generations overlapped in about 10 years. During 1950-1970, a handful of
mainframe, such as IBM 360 and CDC 6400, were built to satisfy the demand from large business or
government organizations. During 1960–1980, lower-c ost minicomputers, like DEC’s PDP 11 and VAX
series, became popular in small business and college campuses. During 1970-1990, personal computers
built with VLSI microprocessors became widespread in use by mass population. During 1980-2000,
massive number of portable computers and pervasive devices appeared in both wired and wireless
applications. Since 1990, we are overwhelmed with using both HPC and HTC systems that are hidden in
Internet clouds. They offer web-scale services to general masses in a digital society.
Levels of Parallelism: Let us first review types of parallelism before we proceed further with the
computing trends. When hardware was bulky and expensive 50 years ago, most computers were designed in
a bit-serial fashion. Bit-level parallelism (BLP) converts bit-serial processing to word-level processing
gradually. We started with 4-bit microprocessors to 8, 16, 32 and 64-bit CPUs over the years. The next
wave of improvement is the instruction-level parallelism (ILP). When we shifted from using processor to
execute single instruction at a time to execute multiple instructions simultaneously, we have practiced ILP
through pipelining, superscalar, VLIW (very-long instruction word), and multithreading in the past 30
years. ILP demands branch prediction, dynamic scheduling, speculation, and higher degree of compiler
Distributed Computing : Clusters, Grids and Clouds, All rights reserved
Fox, and Jack Dongarra, May 2, 2010.
by Kai Hwang, Geoffrey