Wednesday, May 15, 2013

Building IPM 0.983 for lightweight MPI profiling

IPM, or Integrated Performance Monitoring, is a lightweight library that takes advantage of the MPI standard's builtin profiling interface which actually positions the standard user-facing MPI routines (e.g., MPI_Send) as wrappers around the real MPI routines (e.g., PMPI_Send).  By providing an thin, instrumented MPI_Send in place of the MPI_Send that passes directly through to PMPI_Send, all of the MPI communications in a binary can be tracked without the need to modify the actual application's source code.  Thus, profiling an MPI application with IPM is a simple matter of re-linking the application against libipm, or even better, export LD_PRELOAD to have IPM intercept the MPI calls of an already-linked executable.

As far as I can tell, the two most popular lightweight MPI profiling libraries that can be slipped into binaries like this are IPM from Lawrence Berkeley/SDSC and mpiP out of Oak Ridge. Unfortunately the build process and guts of IPM are very rough around the edges, and the software does not appear to be maintained at all (update!  See "Outlook" at the end of this post). mpiP, by comparison, is still being developed and is quite a lot nicer to deal with on the backend.

With that being said, IPM is appealing to me for one major reason: it supports PAPI integration and, as such, can provide a very comprehensive picture of both an application's compute intensity (ratio of flops:memory ops) and communication profile.  Of course, it also helps that NERSC has published a few really good technical reports on workload analysis and benchmark selection that prominently feature IPM.

As I said before, building the latest version of IPM (0.983) is not exactly straightforward for two reasons:
  1. Its configure script requires a number of parameters be explicitly set at configure-time or else all sorts of problems silently creep into the Makefile
  2. It ships with a bug that prevents one component from building correctly out-of-the-box
Neither is a big issue, but it still took me the better part of an evening to figure out what was going on with them.

Step 1. Defining system parameters

The installation instructions just say ./configure should work, but this is not the case because the script which autodetects the system settings, bin/hpcname, is too old. It appears to silently do something and allow the configure process to proceed, but the resulting IPM library does not work very well.

The configure line I had to use on SDSC Gordon was

./configure --with-arch=X86 \
--with-os=LINUX \
--with-cpu=NEHALEM \
--switch=INFINIBAND \
--with-compiler=INTEL \
--with-papiroot=/opt/papi/intel \
--with-hpm=PAPI \
--with-io-mpiio

If you edit at the bin/hpcname script, you can also add your own HPCNAME entry and appropriate hostname. Curiously, the predefined HPCNAMEs specify MPI= which doesn't appear to be used anywhere in IPM...legacy cruft, maybe? One of the key parameters is the --with-cpu option, which defines how many hardware counters and PAPI Event Sets can be used. The NEHALEM setting reflects the latest processor to be available before IPM was abandoned and it seems to be good enough (7 counters, 6 event sets), but the ambitious user can edit include/ipm_hpm.h and add an entirely new CPU type (e.g., CPU_SANDYBRIDGE) with the appropriate number of hardware counters (8 programmable for Sandy Bridge) and event sets.

Following this, issuing make and make shared should work. If this is all you want, great! You're done. If you try to make install though, you will have problems.

Step 2. Building the ipm standalone binary

The IPM source distribution actually builds two things: a library that can be linked into an application for profiling, and a standalone binary that can be used to execute other applications (like nice, time, taskset, numactl, etc). The way in which the IPM source distribution does this is nasty though--it is a tangled web of #included .c (not .h!) files that were apparently not both considered when updates were made. The end result is that the libraries can compile, but the standalone binary (which is a dependency for make install) does not.

The issue is that both libipm (the library) and ipm (the standalone executable) call the same ipm_init() function which initializes a variable (region_wtime_init) that is only available to libipm. Furthermore, region_wtime_init is not declared as an extern within ipm_init.c, making it an undefined symbol during compile time. If you try to build this standalone executable (make ipm), you will see:

$ make ipm
cd src; make ipm
make[1]: Entering directory `/home/glock/src/ipm-0.983/src'
/home/glock/src/ipm-0.983/bin/make_wrappers   -funderscore_post  ../ipm_key 
Generating ../include/ipm_calls.h
Generating ../include/ipm_fproto.h
Generating libtiny.c
Generating libipm.c
Generating libipm_io.c
icc  -DIPM_DISABLE_EXECINFO -I/home/glock/src/ipm-0.983/include  -I/opt/papi/intel/include  -DWRAP_FORTRAN -I../include -o ../bin/ipm ipm.c    -L/opt/papi/intel/lib -lpapi
ipm_init.c(39): error: identifier "region_wtime_init" is undefined
   region_wtime_init = task.ipm_trc_time_init;
   ^

compilation aborted for ipm.c (code 2)
make[1]: *** [ipm] Error 2
make[1]: Leaving directory `/home/glock/src/ipm-0.983/src'
make: *** [ipm] Error 2

If you have the patience to figure out how all the .c files are included into each other, it's relatively straightforward to nail down why this error is coming up. region_wtime_init is only used within libipm.c:

$ grep region_wtime_init *.c
ipm_api.c: region_wtime_final - region_wtime_init;
ipm_api.c: region_wtime_init = IPM_TIME_SEC(T1);
ipm_init.c: region_wtime_init = task.ipm_trc_time_init;
libipm.c:static double  region_wtime_init;

What is including ipm_api.c?

$ grep ipm_api.c *.[ch]
libipm.c:#include "ipm_api.c"

Since region_wtime_init isn't actually needed by the ipm binary itself and is preventing its successful compilation simply because of the nasty code-recycling of ipm_init.c, it is safe to just add a declaration to ipm.c above the #include "ipm_init.c":

unsigned long long int flags;
struct ipm_taskdata task;
struct ipm_jobdata job;
static double region_wtime_init;

#include "ipm_env.c"
#include "ipm_init.c"
#include "ipm_finalize.c"
#include "ipm_trace.c"

Thus, whenever the IPM binary calls ipm_init, it is initializing a variable that just gets thrown away.

Following this, you should be able to build all of the components to IPM and make install.

Words of Caution

  • in older versions of IPM (at least as new as 0.980), make install will delete EVERYTHING in your --prefix path. I guess it assumes you are installing IPM into its own directory instead of somewhere like /usr/local. This "feature" was commented out in the last version (0.983), but be careful!
  • IPM only works with the PAPI 4 API.  Trying to link against PAPI 5 will fail because of the change to the PAPI_perror syntax.  Updating IPM to use the PAPI 5 API is not difficult and perhaps worthwhile; PAPI 5 does not have explicit knowledge of newer instructions like AVX.
  • Recent changes to the MVAPICH API (ca. 1.9) have caused a bunch of gnarly errors to appear due to MPI_Send being changed to MPI_Send(const void *, ...) from just void *.  These appear to be harmless though.

Outlook

It's a bit unfortunate that IPM's development has fallen to the wayside because its ability to do both MPI and hardware-counter-level profiling with no modification to application source is really powerful. Profiling libraries with hardware counters have traditionally been very hardware-dependent (e.g., Blue Gene has an IPM-analogue called libmpitrace/libmpihpm that works on Blue Gene's universal performance counters but nothing else), but PAPI was supposed to address that problem. I'm hoping that more up-to-date MPI profiling libraries will start including PAPI support in the future.

UPDATE: After looking around through the literature, it appears that a major overhaul of IPM, dubbed version 2.0.0, has been created and does a variety of neat things such as OpenMP profiling and a completely modular build structure.  This IPM 2 has been presented and published, but I have yet to figure out where I can actually download it.  However it is installed on our machines at SDSC, so I need to figure out where we keep the source code and if it remains freely licensed.