To increase the effective input/output data rates
of I/O-intensive codes, developers of both applications and parallel file
systems must be able to differentiate between the input/output patterns intrinsic to
applications versus those reflecting the limitations of current systems. Analysis of
the file access patterns generated by new computational science and engineering
applications suggests that I/O libraries for the storage systems of both single and
multiple processor systems can deliver high performance, but only when file
access patterns and data management policies are very carefully matched.
When linked with user applications, libraries included in the Pablo performance characterization toolkit profile multi-level performance behavior by producing SDDF trace files. Explicit event traces can be generated for all combinations of UNIX, MPI-IO, and HDF library use on a variety of platforms, including IBM SP, SGI Origin, HP Exemplar, and SUN Solaris systems.
There are three concepts fundamental to the Pablo Suite of Performance Analysis Tools: Instrumentation, Tracing, and Analysis.
1. Instrumentation and the Base Pablo Trace Library
Instrumentation is targeting specific program instructions for analysis (e.g.
opens, reads, writes) and inserting calls to the Pablo Trace Library wherever these
instructions appear in the application source code. These library calls act as software
"probes" that capture data reflecting the performance of targeted instructions
(defined as trace events by Pablo tools) and encapsulate it as a series of SDDF records.
The programs in the Trace Library contain the performance-capture code for recording
timestamped event records. The library can be used for general application
instrumentation, not only I/O. When linked to a user's application code, it captures the
performance metrics of targeted program instructions and stores them in SDDF records.
Pablo Trace Library Extensions - Designed specifically for I/O analysis and built on top of the base library, the Trace Library Extensions simplify the process of instrumenting those application constructs of particular interest to I/O performance analysts. Three extensions are currently being deployed and a new, thread-safe version, incorporating the functions of both the Base Pablo Trace Library and the three extensions, Pablo Performance Capture Facility, is currently under development:
UNIX I/O Extension-The UNIX I/O extension to the Pablo Trace Library is a set of programs that can be used as replacements for the standard UNIX, C, and FORTRAN I/O calls. The trace library calls are exactly the same as the standard I/O calls except they have been augmented by instrumentation software, attached before and after the call itself, to capture and record internal data.
MPI I/O Extension-The MPI I/O extension to the Pablo Trace Library is a set of programs that can be used as replacements for the standard MPI I/O calls. The trace library calls are exactly the same as the standard MPI I/O calls except they have been augmented by instrumentation software, attached before and after the call itself to capture and record internal data.
HDF Extension-Rather than being deployed as a distinct library, the Pablo HDF extension has been built into an instrumented version of HDF itself. The HDF extension was designed for analysts interested in characterizing the performance, interaction, and overall behavior of both the HDF data format and applications built on top of that format.
Performance Capture Facility - The Base Pablo Trace Library and its extensions support systems that achieve parallelism using MPI. New research has centered on extending the Pablo Trace Library to support distributed systems that use parallelism other than MPI, such as openMPI systems, by replacing the Base Library and its extensions with a facility that supports the instrumentation of threaded code. The Pablo Performance Capture Facility (PCF), currently in development, is a thread-safe tool designed to analyze the Unix I/O, MPI-I/O, MPI, HDF 4 and HDF 5 activity of application codes. Depending on the build option, the PCF can provide an interface to Autopilot clients using sensors or to other interfaces.
Physical I/O Facility - The Physical I/O Facility is a software toolkit that, when coupled with application I/O instrumentation, helps reveal the correlation between application I/O requests and physical I/O operations. this is significant because physical I/O patterns are strongly affected by data striping mechanisms, file system policies, and disk hardware attributes. Understanding how the operating system translates application I/O requests into physical disk operations can aid in optimizing file policies and data distributions for higher performance.
2. Tracing and the Self-Defining Data Format
(SDDF)
Tracing is generating files that detail the performance of an instrumented trace event every time it is
executed. SDDF records are the fundamental elements of a trace file. They store data
reflecting the occurrence and duration of each trace event (targeted program instruction).
The user specifies which instructions are targeted, the specific data collected about
targeted instructions, and the way that data is organized within individual SDDF records.
Extensions to the Pablo trace library provide sets of pre-defined formats for recording
data about individual trace events.
3. Analysis
Analysis, as facilitated by Pablo tools, is reviewing any of a number of
graphical and statistical reports generated by Pablo analysis tools using SDDF trace
files. The reports can take any of three basic forms: static graphs, dynamic graphs and
statistical tables.
Pablo instrumentation, tracing, and analysis tools work together to provide a detailed characterization of the I/O behavior of scalable applications and existing parallel file systems. They were designed to enable application developers to achieve a higher fraction of peak I/O performance on existing parallel systems and system software developers to design better parallel file system policies for future generation systems. As part of the CADRE initiative, the Pablo Research Group is working to extend these and other libraries to characterize: a) physical I/O patterns via instrumented SCSI disk drivers and b) other platforms including Linux and WindowsNT.