Software Tools for Analyzing I/O Behavior in Data-Intensive Applications |
|
One of the main goals of performance analysis is to identify sources of poor performance. I/O performance analysis is done through a cycle that moves from instrumentation, to analysis and, in some cases, visualization, to I/O optimization of code. As software and parallel systems increase in complexity, the process of precisely identifying bottlenecks becomes more difficult. Statistical summaries and graphical visualizations of I/O data, which reflect the execution behavior of data-intensive applications, can improve analysts' comprehension of the I/O system's method of accessing voluminous datasets. Such reports allow them, quickly, to grasp the common characteristics and features of data collections. To facilitate valuable insight, analysis tools must be scalable -- capable not only of accommodating data from a very large numer of processors, but of presenting the data in ways that are intuitive and instructive. Used in conjunction with SDDF trace files generated by the Pablo Trace Libraries during program execution, Pablo analysis tools can distill and communicate information exposing the interdependencies and interactions among an application's access patterns, I/O APIs, library implementations, file system features and policies, and storage hardware configurations. A variety of programs and tools exist to help analyze the SDDF trace event files generated by Pablo-instrumented codes. These tools generate statistical summaries of the activity generated by UNIX, MPI-I/O, and Hierarchical Data Format (HDF versions 4 and 5) routines, enabling application developers to assess the performance of large, long-running applications. Pablo analysis tools provide features for tailoring reports to the particular needs of researchers. Typically, I/O analysts use the Pablo I/O characterization toolkit to gather data from some large number of nodes. Their collective activity is the focus of interest. The Pablo toolkit provides a merge utility, MergePabloTraces, that combines separate traces into a single file. This is particularly useful when a large number of nodes or files are engaged and the application itself generates many files. The toolkit also provides the ability to analyze trace information from individual nodes and files, rather than from all nodes and files. Pablo I/O Analysis Tools can be used either using in the Unix or Windows environment. The primary differences are in the user interface; windows uses GUI and Unix uses a command line interface. Unix: Windows:
|