PapiEx - Command line/library utility to measure hardware performance counters with PAPI
This version of PapiEx is no longer supported directly by it's
authors, the University of Tennessee or former employees of
SiCortex. An advanced and fully supported version is available
from Samara Technology
Group as part of their Performance Technology Platform. Samara
Technology Group is a performance technology and services company
staffed by the leading performance optimizations experts in the
industry and the authors of this software. Support and enhancements
for this Open Source version are also available via contract. Please
contact them at
sales at samaratechnologygroup.com
PapiEx is a performance analysis tool designed to transparently and passively measure
the hardware performance counters of an application using PAPI. It uses Monitor to
to effortlessly intercept process/thread creation/destruction. It measures the entire run
of an application. By default this includes all subprocesses. PapiEx's goal is to be a
Linux substitute for the perfex command found in SGI's Speedshop. PapiEx is fairly
simple to build, install and use. The most up to date documentation for monitor is always found in the man page.
- Differences between the commercial version and this version.
- No external dependencies other than Monitor and PAPI.
- Supports papiex_start()/papiex_stop() calipers in user code. man page.
- Can report all sorts of memory usage.
- Supports PAPI multiplexing.
- Supports automatic counting of useful available events for the architecture with a single flag (-a).
- Automatically detects threaded executables.
- Works for MPI and threaded-MPI executables.
- Has special support for MPICH, which avoids the need to link papiex to the MPICH library.
- Dumps aggregate statistics, such as mean/max/avg across threads and tasks.
- Works across variants of fork/exec and handles SIGINT/asserts/aborts properly.
- Can dump out shell arguments for those not wanting to use the papiex driver program.
- Supports counting native events (non-PAPI) and different counting domains.
- Architecture independent build and papiex-config driver. man page.
Download and Installation
- CVS is the best way to get the code.
- Those without CVS access (?!) can find the most recent release (0.99): papiex-0.99.tar.gz
- To build/install Papiex, please see the INSTALL file.
- To use Papiex, please see the man page.
- To see the ChangeLog, please see the file.
The best documentation is in the form of examples. This example ASSUMES you
have successfully built AND installed PapiEx AND that you're in the platform specific build directory.
First we run emacs and count Total Cycles and Total Instructions redirecting the output from stderr(default)
to a file. Next we run the pthreads test case and tell PapiEx to create files.
[mucci@localhost]$ papiex -e PAPI_TOT_CYC -e PAPI_TOT_INS emacs 2> sample.emacs
[mucci@localhost]$ tests/papiex -e PAPI_TOT_CYC -e PAPI_TOT_INS tests/pthreads
Here's the output: sample.emacs, sample.pthreads.1, sample.pthreads.2, sample.pthreads.3 and sample.memory.
PapiEx can automatically multiplex and count useful events available on your architecture. This
is similar in intent to perfex -a and hpmstat -a
[mucci@localhost]$ papiex -a find /usr 2> sample.find
For statistical relevance, you should make sure that the run is reasonably long.
Multithreaded executables are handled seamlessly. PapiEx creates an output file the
The user can prefix the output file name with -pprefix flag. As an example:
[mucci@localhost]$ papiex -pmystats_ ./thrspecific 2>sample.thrspecific
The stderr output contains the aggregate statistics across
all five threads of the executable. Individual per-thread statistics are placed in a directory
Here are the files:
Now let's consider a more involved example with a threaded-MPI run.
[mucci@localhost]$ mpirun -np 4 papiex -f /tmp bin/mpich2-mpi-thrspecific 2>sample.mpich2-mpi-thrspecific
The -f flag instructs PapiEx to create all output files under /tmp.
The aggregate statistics across all tasks (which in turn are aggregated across all the threads
for the task) are written to stderr, and can be seen here.
The per-task and per-thread statistics are placed in:
Per-task summaries, which are averaged across all the threads of a task can be seen under this directory:
The directory also contains per-task directories, which contain per-thread numbers as shown in the
Finally, let's consider how PapiEx makes using mpiP, a light-weight library for scalable
profiling of MPI calls, easy to use. Normally, mpiP needs to be linked into the target executable.
The PapiEx driver allows seamless deployment of mpiP on dynamically-linked executables.
Let's see this with an example:
[mucci@localhost]$ mpirun -np 4 papiex -e PAPI_L1_DCM -M bin/mpich2-simple-mpi 2> sample.mpich2-simple-mpi
In the example we instruct PapiEx to measure L1 data cache misses, and also do
MPI profiling with mpiP. The stderr output can be viewed in
sample.mpich2-simple-mpi. The mpiP is stored in
The PAPI task statistics are stored in:
Currently, the best way to get PapiEx is to get it directly from CVS. You can access the CVS repository with your browser or use the anonymous CVS pserver. Just hit enter when asked for the password.
% setenv CVSROOT :pserver:firstname.lastname@example.org:/cvs/homes/ospat
% cvs login
% cvs co papiex
The distribution includes a 'make test' phase. The current release has been tested on:
- MIPS64, MIPS32
- PPC64, PPC32
Bugs should be submitted to the PAPI Mailing List.
PapiEx was written by Philip J. Mucci of the Innovative Computing Laboratory and SiCortex Inc.. Major contributions and enhancements were made by Tushar Mohan, also of SiCortex Inc.
This software is COMPLETELY OPEN SOURCE with an LGPL license. If you incorporate any portion of this software, I would appreciate an acknowledgement in the appropriate places. Should you find PapiEx useful, please considering making a contribution in the form of hardware, software or plain old cash.