Characterizing I/O: A Primer in the Research and Challenges Surrounding High-Performance I/O

A Summary of the Evolution of Today's High-Performance Computing Environment

DCLClusterBackside.jpg (351868 bytes)The processors found in today's most powerful computers achieve speeds exceeding a billion cycles per second, or a gigahertz. When compared to the most advanced technology of only a decade ago,  today's high-performance systems reflect an order of magnitude increase in speed. This technological growth has created new opportunities for a more expansive range of researchers to access high-end, problem solving resources.

Throughout the 1990s, to process the tightly coupled, parallel computations of their applications, developers of high-end, resource-intensive simulations or data-intensive computations had to queue up for access to ultra high-bandwidth and low-latency systems, typically supercomputers. These tightly integrated systems reflect customization deep into the computer system design and are built using highly specialized components. Such adaptations, while enabling supercomputers to readily deliver high performance, have often left them scarce, expensive, and under-supported by programming tools and software (relative to conventional computers). Problems with the accessibility and affordability of high-end systems has not only plagued potential developers of cutting-edge applications. The research, resources, and customer support required to commercialize such systems has undermined the solvency of a series of supercomputer manufacturers.

In contrast during the 1990s, the mass market for personal computers and workstations underwent explosive growth. This situation served not only to bring the prices of commodity computers to a widely affordable level, but also caused computer manufacturers to concentrate research and development efforts on satisfying the more lucrative, mass market demands, often at the expense of the high-end market. Countering this situation, an alternative approach to high-performance computing emerged, an approach exploiting the value of conventional processors by distributing a single application's processing work across a group of processors that work in parallel, executing separate chunks of a high-performance application and communicating with one another as necessary.

Parallel processing is the processing of program instructions by dividing them among multiple processors with the objective of running a program in less time. Early computers ran one program at a time. Two programs that each took one hour to run would take a total of two hours to run. After parallel processing was introduced, both programs could be run together with their execution interleaved such that, for instance, while a processor was waiting for one program's I/O operation to complete, it would execute part of the other program. The total execution time for the two jobs would be less than two hours.

Then multiprogramming was introduced. The operating system would cycle short periods of processor usage among multiple programs. But contention for resources emerged in these systems and, in the absence of  tie-breaking instructions for resolving such conflicts, led to deadlocks. Programming machines for unified system resource management proved to be very challenging.  Reflecting this constraint, initial attempts at multiprocessing employed two or more processors cooperating within a master/slave configuration to share the work. The master was programmed to be responsible for all of the work in the system; the slave performed tasks assigned by the master.

Symmetric multiprocessing (SMPs) systems soon emerged, unifying system resource management. In an SMP system, where multiple processors use a common operating system and memory to processes programs, each processor is equally capable and responsible for managing the flow of work through the system. By pushing the limits of SMPs, developers soon realized that, while SMP machines perform well on many types of problems where the volume of data is not too large, they don't scale well. As the number of processors in SMP systems increases, the time it takes to distribute data among the parts of the system grows too. There comes a point when the performance benefit of adding more processors to the system is too small to justify the additional expense.

So message-passing systems were created, enabling programs that share data to replace general broadcasts of changes in the values of particular operands with targeted messages, sent only to those programs concerned with the new value. Message passing systems don't employ a shared memory to support the transfer of messages between programs. Instead, this is done via a network. Enabling hundreds, even thousands, of processors to cooperate efficiently in one system, such "scalable" systems are called massively parallel processing (MPP) systems. Often MPP systems are structured as clusters of processors. Within each cluster, the processors interact as in a SMP system. It is only between the clusters that messages are passed.

Recognizing the significance of advances in component and commodity technologies, the high-performance computing community has expanded on the concept of cluster processing, uncovering an approach to resource and data-intensive computing that further exploits the value of mass market technologies. The advent of increasingly high-bandwidth and low-latency technologies has made it feasible for even modestly funded researchers to configure groups of inexpensive personal computers for high-performance, parallel computing. Early benchmarks have indicated a promising potential for significant scalability and have documented peak performance levels approaching those achieved by state-of-the-art supercomputers.

Go on to Applying the Performance Benefits of Parallelism to the I/O Bottleneck

Return to Primer Table of Contents