Introduction to PAPI

PAPI, or the Performance Application Programming Interface is a machine independent set of callable routines that provide access to the performance counters on most modern processors. It is installed on a variety of machines available through cs.utk. The illustration in this exercise should work on any of the Intel Pentium Linux machines: msc01 - 08 or torc0 - 8, or any node of the boba or frodo SinRG clusters.
For further information about PAPI, check the PAPI website .

PAPI High Level Calls

PAPI is implemented in layers. The top layer consists of eight calls which provide a simple interface to PAPI functionality for many applications. An overview of these eight functions can be found on the PAPI manual pages . A single high level call, PAPI_flops , is all that will be needed for this illustration.

Using PAPI to Measure Execution Time

As described in the PAPI_flops documentation , a call to PAPI_flops returns four parameters, discussed below:

*rtime -- total real time in seconds since the first PAPI_flops() call

*ptime -- total process time in seconds since the first PAPI_flops() call

*flpops -- total floating point operations since the first PAPI_flops() call

*mflops -- Mflops/s achieved since the latest PAPI_flops() call

The values of rtime and ptime are derived from the cycle counter on the Pentium chip, and multiplied by a computed clock speed for the given processor as determined by measuring against a system real-time clock. You can stop the counters used by PAPI_flops with a call to PAPI_stop_counters. The next call to PAPI_flops will start over with fresh values for all returned parameters.

The Source Code

To illustrate the use of PAPI_flops for performance measurement, we provide a simple C routine to multiply two matrices. The source code can be found in PAPI_flops.c . Note that all programs that use PAPI must #include papi.h . You can open each of these files in your browser and save them to your home area.

Running this example

To try out this example, log on to a machine on which PAPI is installed. This could be any of: msc01 - 08, torc0 - 8 or the boba or frodo SinRG clusters. save the files PAPI_flops.c and papi.h into your area. Execute the following command line to compile and link this test:

UNIX> gcc -I/usr/local/include -O0 PAPI_flops.c  /usr/local/lib/libpapi.a -o PAPI_flops

When you run the program, you should get output similar to the following (your milage may vary):

UNIX> PAPI_flops
Real_time: 0.077321
Proc_time: 0.077193
Total flpins: 2000000
MFLOPS: 25.909208
PAPI_flops.c PASSED

Programming on your own

To use PAPI_flops in your own code, you can either modify this file to suit your needs, or copy the relevant pieces to code you have already written. Make sure to #include "papi.h" and remember that a -1 value in flpins will reset the counters. Experiment with the make line to suit your needs.

Notes for Fortran

You can refer to  the PAPI_flops documentation to get the exact calling syntax for Fortran. You can also refer to the PAPI Fortran page for more general information on calling PAPI routines from Fortran. Remember that the Fortran calls have an extra check parameter at the end to pass back error status. Also keep in mind that a long long value in C (64-bit integer) is an INTEGER*8 in Fortran, and a float in C is a REAL in Fortran.
A sample command line to compile and link the Fortran program foo might look like this:

UNIX> f77 foo.f /usr/local/lib/libpapi.a -o foo.out




Innovative Computing Laboratory
2001 R&D Winner
Contact PAPI: papi@cs.utk.edu Computer Science Department

University of Tennessee