Characterizing I/O: A Primer in the Research and Challenges Surrounding High-Performance I/O

Applying the Performance Benefits of Parallelism to the I/O Bottleneck

The technological leaps reflected in the performance of mass market computational and networking technologies have not been complemented by compensatory advances in the I/O subsystem technologies required to support high-performance computing. For instance, current research into molecular biology, attempting to reveal the subtle characteristics of complex systems such as the human genome, must manipulate terabytes of information, tens of trillions of bytes of data. Likewise, applications in environmental hydrology, aimed at protecting the delicate balance of entire, interacting ecosystems, involves vast datasets grouped into adaptive meshes to simulate the impact of various environmental influences. Today the most advanced research and development efforts in many fields, including chemical engineering, nanomaterials, and scientific instrumentation, rely upon simulations and computations that are resource and data intensive.

Many of these applications are so large scale that they generate intermediate data too large to retain in primary memory. During execution, applications must read and write the requisite data to secondary, or disk, storage. These out-of-core applications must use the I/O subsystem to manipulate overflow data housed outside main memory. Operating at only a fraction of the speed achieved by parallel processors, the I/O subsystem creates a bottleneck, degrading the system's overall performance by stalling processors while I/O operations plod along. To grasp the impact of this dilemma, realize that state-of-the-art disk drives can transfer approximately 10 million bytes per second. In little over a minute, they move a billion bytes. Multiply that billion by 1,000, to move a terabyte, and more than 27 hours are required for disk access. Today’s high-performance computing models generate this much information every second.

The result is an I/O performance barrier rivaling or exceeding that for the computation of high-end applications. A number of factors, which influence the performance of current I/O subsystems, have combined to create this situation. The evolution of the commodity storage market, the surprising complexity of application access patterns, and the paucity of knowledge about effective I/O system design have rendered the speed of conventional I/O operations insufficient to support terascale computing. Fundamentally, the input/output bottleneck is caused by the differential rate of technological evolution between electronic devices, such as processors and memory, and input/output devices that incorporate moving parts. The bottleneck is most evident on parallel computation platforms where multiple processors must share input/output resources. Increasing disparities between disk capacities and transfer rates have exacerbated the problem. With few exceptions, the commercial data storage market rewards high storage capacity, small form factors, and low power consumption rather than high bandwidth.

I/O parallelism can offset this trend. Involving tens to thousands of cooperating storage devices, I/O parallelism is key to enabling the level of performance promised by new scalable, parallel distributed environments. Combining the storage space of multiple disks into a single, large, logical disk, I/O parallelism entails striping data across several storage devices and accessing these devices in parallel to achieve higher input/output rates. Instead of moving 10 million bytes to a single I/O device in a second, a parallel I/O subsystem, ideally, is comprised of multiple, equally loaded devices and performance scales with the number of devices across which data is striped. Every second I/O operations move10 million bytes times the number of devices employed. striping.jpg (68207 bytes)

However, the effectiveness of disk striping techniques is dependent on the configuration of the storage system and the characteristics of the workloads they attempt to accommodate. A parallel I/O subsystem can increase the potential I/O bandwidth, but workload balancing is required to enable the system to continuously work at capacity and increase the I/O bandwidth actually achieved. Enabling file systems to maintain an equal distribution of the workload across input/output devices is an elusive goal. Potentially, the file system can exploit established access pattern information to identify and apply policies governing data distributions, caching, prefetching, and write back procedures that provide high performance. It can reserve available input/output resources to service known input/output requirements, yielding an optimal I/O schedule. But when large numbers of disks and disk arrays are coupled with a multi-level storage management system and a wide variety of possible parallel file access patterns, the range of potential data management strategies is immense. Identifying optimal, or even acceptable, operating points becomes problematic.

Go on to Building a Profile: Probing for Data that Characterizes the I/O Behavior of Applications in High-Performance Computing

Return to Primer Table of Contents