Skip to end of metadata
Go to start of metadata

CHPC Software: MPI

Message Passing Interface (MPI) is the principal method of performing parallel computations on all CHPC clusters. Its main component is a standardized library that enables communication between processors in distributed processor environments. There are numerous MPI distributions available and thus CHPC supports only some of them, those we believe are best suited for the particular system.

More information: MPI standard page.

The CHPC clusters utilize two types of network interconnects, Ethernet and InfiniBand. Users should use InfiniBand since it is much faster than Ethernet.

MVAPICH2 is an InfiniBand MPI distribution derived from MPICH2. We also offer OpenMPI built with InfiniBand. These two distributions are fairly equivalent in single thread performance, however, MVAPICH2 seems to be more stable and faster with multi-threaded programs. There are cluster specific builds of OpenMPI and MVAPICH2; they have been tuned to have superior performance for each respective cluster. More information on each can be found here: MVAPICH2  OpenMPI

Ethernet may be used for applications which don't do much communication, or, for debugging on the interactive nodes. The MPI distribution of choice for Ethernet is MPICH2. There are no cluster specific builds of MPICH2. Instead, a general build is offered for all clusters.

Also note that there are cluster specific build 

Sourcing MPI and Compiling

Before performing any work with MPI, users will need to source different MPI distributions based on their needs. By default, certain distributions are automatically sourced in the login scripts (.bashrc, .tshrc), but for any serious work it is better to explicitly define the running/compiling environment. 

To source an MPI package, issue a command with the following format:

source /uufs/<cluster>/sys/pkg/<MPI distro>/<compiler>/etc/<MPI distro>.<shell>

This format is consistent across all CHPC clusters. The different options for each highlighted field are shown below:

<cluster>  = ember.arches, updraft.arches, sanddunearch.arches, kingspeak.peaks, (General builds)
<MPI distro> = mvapich2, openmpi
<compiler> = std (GNU), std_intel (Intel), std_pgi (Portland Group)
<shell> = .sh (bash), .csh (tcsh) 

The user must make sure all the fields are correct! Failure to do so will result in jobs that will not run, compiling errors, and many headaches.

Example 1. If you were running a program on ember that was compiled with the Intel compilers and uses mvapich2, and your user shell was bash:

source /uufs/ember.arches/sys/pkg/mvapich2/std_intel/etc/

Example 2. If you were running a program on updraft that was compiled with the PGI compilers and uses OpenMPI, and your user shell was tcsh:

source /uufs/updraft.arches/sys/pkg/openmpi/std_pgi/etc/openmpi.csh

The CHPC keeps older versions of each MPI distribution for backwards compatibility. These versions can be found in the respective directory for each distribution:

/uufs/<cluster>/sys/pkg/<MPI distro>/<version><g,i,p>

Different version are indicated by version numbers, and what compiler was used (GNU, Intel, PGI, respectively). If no compiler tag is given, assume that it is GNU. 


Compiling with MPI is quite straightforward. Below is a list of MPI compiler commands with their equivalent standard version:

LanguageMPI CommandStandard Commands
Cmpiccicc, pgcc
C++mpicxxicpc, pgCC
Fortran 77/90mpif90,mpif77

ifort, pgf90, pgf77

When you compile, make sure you record what version of MVAPICH2 or OpenMPI you used. The std builds are periodically updated, and programs will sometimes break if they depend on the std builds.

Running with MPI

IMPORTANT: Before running a job, ALWAYS source your MPI package explicitly in your PBS scripts. 
Running MVAPICH2 programs

MPICH2 and MVAPICH2 have transitioned to a different process manager called Hydra, which, unlike the previous MPD process manager, does not use daemons on each node. The MPI process startup is now fairly simple with a single command. However, on Sanddunearch, which uses RSH for internode connectivity, we have to use the RSH variety of the mpirun command. For general help with mpirun, please consult the manpages (man mpirun). Note that in all of these examples, you will need to set your own paths and number of procs ($PROCS, $WORKDIR). 

To run on Sanddunearch:

mpirun -launcher rsh -np $PROCS -hostfile $PBS_NODEFILE $WORKDIR/program

To run on Kingspeak, Ember or Updraft:

mpirun -np $PROCS -machinefile $PBS_NODEFILE ./program

For multi-threaded parallel programs, there are additional arguments that must be passed to the program, as well as additional manipulations to gain better performance. Sepcifically, the variables OMP_NUM_THREADS (number of threads to parallelize over) and MV2_ENABLE_AFFINITY=0 (disables process affinity to a physical processor) must be set. Additionally, on Updraft, process affinity must be disabled with IPATH_NO_CPUAFFINITY=1. Also note that some math libraries (particularly the Intel MKL) can run multi-threaded, which can impact performance of some programs. 

In the following example, it is presumed that the user is launching mpirun once per socket on multiple nodes. For example, on Ember, one would launch six threads per socket, twice on each node. This is reflected in the how the nodefile is split:

cat $PBS_NODEFILE | uniq > nodefile1
cat nodefile1 nodefile1 > nodefile

The nodefile is reduced to the unique nodes, and then is duplicated using cat to account for each node. Depending on how many cores there are per socket for any given CHPC hardware, the number of threads per instance must be modified. 

To run multithreaded on Ember:

mpirun -genv MV2_ENABLE_AFFINITY 0 -genv OMP_NUM_THREADS 6 -n $NODES -machinefile nodefile $WORKDIR/program.exe

To run multithreaded on Updraft:

mpirun -genv MV2_ENABLE_AFFINITY 0 -genv IPATH_NO_CPUAFFINITY 1 -genv OMP_NUM_THREADS 4 -n $NODES -machinefile nodefile $WORKDIR/program.exe

To run multithreaded on Sanddunearch:

mpirun -launcher rsh -genv MV2_ENABLE_AFFINITY 0 -genv OMP_NUM_THREADS 2 -n $NODES -machinefile nodefile $WORKDIR/program.exe

Running OpenMPI programs

Generally, our tests show that for the InfiniBand, OpenMPI performance is slightly below that of MVAPICH2. We have also found problems when running multi-threaded programs in MPI_THREAD_MULTIPLE mode. Nevertheless, OpenMPI has a number of appealing features (including MPI-2 compliance) that have led us to provide it to CHPC users. Again, see the manpages for OpenMPI for details. 

Running OpenMPI programs is straightforward, and the same on all clusters:

mpirun -np $PROCS -machinefile $PBS_NODEFILE $WORKDIR/program.exe

OpenMPI is capable to run multi-threaded MPI programs, as long as they don't use the MPI_THREAD_MULTIPLE mode (i.e. communicate from single thread only). For MPI_THREAD_MULTIPLE mode, we recommend MVAPICH2 instead. We have found that it is advantageous to distribute the processes across the CPU sockets and bind them to the sockets, which is achieved by flags -bysocket -bind-to-socket.

To run an OpenMPI program multithreaded:

mpirun -np $PROCS -machinefile $PBS_NODEFILE -bysocket -bind-to-socket $WORKDIR/program.exe

Running and Debugging with MPICH2

MPICH2 is an open source implementation of MPI 2.0 standard developed at Argonne National Laboratories. It runs only over Ethernet, therefore MVAPICH2 or OpenMPI will provide superior performance. MPICH2 should only be used for debugging on interactive nodes and possibly for embarrassingly parallel problems, although in general MVAPICH2 will be a better choice anyway.

Sourcing MPICH2 and Compiling

MPICH2 is only located on the general cluster filesystem:

source /uufs/<compiler>/etc/mpich2.<shell>

Replace <compiler> with std, std_intel, or std_pgi depending on your needs, and <shell> with .sh or .csh.

Compiling with MPICH2 behaves the same way as the other distributions. See the table above. 

Running MPICH2 programs

MPICH2 executes in the same fashion as MVAPICH2. Again, sanddunearch requires the use of RSH for the launching process, whereas ember and updraft use SSH. 

To run on Sanddunearch:

mpirun -launcher rsh -np $PROCS -hostfile $PBS_NODEFILE $WORKDIR/program.exe

To run on Ember or Updraft:

mpirun -np $PROCS -machinefile $PBS_NODEFILE ./program.exe

  • No labels

1 Comment

  1. Compiling and running OpenMPI on Updraft with MPI/OpenMP:

    /usr/mpi/intel/openmpi-1.4-qlc/bin/mpicc -openmp pi_ompi.c ctimerlinux.c -o pi_ompi_upompi -limf

    [u0101881@up211 progs]$ /usr/mpi/intel/openmpi-1.4-qlc/bin/mpirun -np 1 -machinefile nodefile -x IPATH_NO_CPUAFFINITY=1 -x OMP_NUM_THREADS=1 ./pi_ompi_upompi
    proc    0 time = 7.4927811622619629 sec
    [u0101881@up211 progs]$ /usr/mpi/intel/openmpi-1.4-qlc/bin/mpirun -np 1 -machinefile nodefile -x IPATH_NO_CPUAFFINITY=1 -x OMP_NUM_THREADS=2 ./pi_ompi_upompi
    proc    0 time = 3.7524480819702148 sec
    [u0101881@up211 progs]$ /usr/mpi/intel/openmpi-1.4-qlc/bin/mpirun -np 1 -machinefile nodefile -x IPATH_NO_CPUAFFINITY=1 -x OMP_NUM_THREADS=4 ./pi_ompi_upompi
    proc    0 time = 1.8768770694732666 sec