Hierarchical Data Format (HDF) is a multi-object file
format, developed by the National Center for Supercomputing Applications, that facilitates
the transfer of various types of data between high-performance computers and their file
systems. Providing a common, machine independent way of storing complex, multidimensional
data in files that include information about the characteristics and metrics of data as
well as the data itself, HDF is widely used by researchers working with huge volumes
of data, such as meshes, images, and satellite swaths. Often such data are not readily, or
efficiently, described by conventional datatypes. Machine independence combined with
flexible facilities for data modeling and the incorporation of contextual metadata enable
HDF datasets, generated by unrelated scientists and engineers, to be shared readily by
autonomous applications.
The HDF extension to the Pablo Trace Library has been incorporated into a version of HDF to capture and record performance metrics reflecting the HDF behavior during application execution. When an application is built on the instrumented version of HDF, the standard HDF routines are replaced with corresponding Pablo routines, which encapsulate the actual HDF calls in Pablo trace function calls. Recording the information passed in their arguments, together with timestamp and processor details, Pablo HDF extension routines record the performance of HDF I/O activity by writing captured metrics to Pablo Self Defining data Format (SDDF) files for later analysis. One SDDF file is produced for each process in the HDF application. HDF developers can use the version of HDF equipped with Pablo instrumentation libraries, along with Pablo's Unix I/O extension, to study the I/O behavior underlying an HDF implementation and to analyze the efficiency of specific HDF libraries. Refer to standard HDF documentation for a user's guide on using HDF.