Skip to end of metadata
Go to start of metadata

CHPC Software: Parallel Application Development Tools


TotalView is a full-featured, source-level, multi-process, multi-thread graphical debugger f or C, C++, Fortran (77 and 90), PGI, HPF, assembler, and mixed source/assembler codes. Totalview provides industry standard support for parallel and distributed computing, e.g. MPI, PVM, and IBM's Parallel Operating Environment (POE).

More information:

Notes on installation on the CHPC machines

Arches Metacluster

Totalview on the Arches is installed in /uufs/arches/sys/pkg/totalview/std. Current version is 8.6. Both serial and parallel debugging (MPI, OpenMP) are supported.

Totalview supports most available compilers, including GNU, PGI, Pathscale and Intel.

In order to compile your code and run it in Totalview, follow these steps:

Serial code:

Serial code can be debugged on the interactive nodes, provided it does not generate high load that can affect responsiveness to other users.

  1. Compile with -g:
    • For GNU compilers:
      • g77 -g source_name.f -o executable_name
    • For PGI compilers:
      • pgf77 -g source_name.f -o executable_name
    • For Pathscale compilers:
      • pathf90 -g source_name.f -o executable_name
    • For Intel compilers:
      • ifort -g source_name.f -o executable_name
  2. Run the program in Totalview:
    • totalview ./executable_name
Parallel MPI/OpenMP code:

Parallel code can be debugged on the interactive nodes, or on the compute nodes. For smaller problems, we recommend using interactive nodes, as debugging on compute nodes has several limitations.

Parallel debugging on interactive nodes:

In order to debug in parallel on iteractive nodes, one has to use MPICH2 MPI implementation. This is because MPICH2 supports single node parallel program launch without the need for the rsh command, which is disabled on interactive nodes for security reasons.
Also, since interactive nodes have limited number of processors (2 or 4) and memory (2 to 4 GB), only parallel programs that need less processors and memory should be debugged on interactive nodes. Note that you can debug more than 4 process parallel program on 4 physical CPUs, since debugging in general puts less load on the CPUs and as such more than one process can be executed at a single CPU at the time. Still, debugging using say 4 processors should be sufficient for most problems as most parallel programming bugs show on any number of processors.

The debugging process on interactive nodes is as follows:

  1. Check that MPICH2 is the default compiler
    • which mpicc/uufs/arches/sys/pkg/mpich2/1.0.7/bin/mpicc/uufs/arches/sys/pkg/mpich2/1.0.7p/bin/mpicc/uufs/arches/sys/pkg/mpich2/1.0.7i/bin/mpiccCHPC's MPI Documentation~/.mpd.conf
  2. Compile with -g:
    • mpif77 -g source_name.f -o executable_name
  3. Start MPICH2's MPD daemon on the give node
    • mpdboot -n 1
  4. Start Totalview:
    • totalview &
  5. Totalview opens New Program window. In this window, fill in the following:
    • In tab Program, field Program, put in the executable program name
    • In tab Parallel, field Parallel System, choose MPICH2, and for Tasks set the number of processors you want to run on
    • Click OK to load the program into Totalview. After a little while, you will get a dialog box saying: Process executable.exe is a parallel job. Do you want to stop now? Click Yes, set some breakpoints and start debugging.
Parallel debugging on compute nodes:

In order to debug on compute nodes, one has to run a job through the queue. This complicates the process slightly so, debug on compute nodes only if debugging on interactive node does not reproduce the error you are looking for.

  1. Compile with -g:
    • /uufs/arches/sys/pkg/mpich/bin/std/mpicc -g source_name.f -o executable_name
    • /uufs/arches/sys/pkg/mpich/std/bin/mpif90 -g source_name.f -o executable_name
    • For parallel OpenMP application using PGI compilers and include -mp to invoke OpenMP directives support, -omp for Pathscale compilers or -openmp for Intel compilers.
    • For debugging parallel codes running over Myrinet, compile with MPICH for Myrinet:
      • /uufs/$UUFSCELL/sys/pkg/mpich-mx/std/bin/mpicc -g source_name.f -o executable_name
      • /uufs/$UUFSCELL/sys/pkg/mpich-mx/std/bin/mpif90 -g source_name.f -o executable_name
    • For debugging parallel codes running over InfiniBand, compile with MVAPICH:
      • /uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpicc -g source_name.f -o executable_name
      • /uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpif90 -g source_name.f -o executable_name
  1. ssh to the interactive node with X forwarding (flag -Y)
    • ssh -Y
  2. Start interactive PBS session specifying X forwarding flag (-X), e.g.:
    • qsub -I -X -l nodes=2,walltime=2:00:00
  3. In case of debugging a parallel code over Myrinet, also set the TOTALVIEW environment variable:
    • setenv TOTALVIEW "/uufs/arches/sys/totalview/bin/totalview"
  4. Run mpirun with -tv flag to invoke totalview. Use the $PBS_NODEFILE for your machinefile the same way as you would in running with PBS script:
    • /uufs/arches/sys/pkg/mpich/std/bin/mpirun -np 2 -tv -machinefile $PBS_NODEFILE executable_name

Totalview will start with the source code of your executable in a new window. MPICH-MX with Myrinet uses slightly different startup mechanism, use -totalview as:

    • /uufs/delicatearch.arches/sys/pkg/mpich-mx/std/bin/mpirun.ch_mx -np 2 -totalview -machinefile $PBS_NODEFILE executable_name

      MVAPICH for InfiniBand uses -tv flag, but, some other parameters are different:
    • /uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpirun_rsh -rsh -np 2 -tv -hostfile $PBS_NODEFILE executable_name

      InfiniPath MPI, that's installed on Updraft does not support Totalview. If you need to debug on Updraft, please, build your code with MPICH2. We are trying to persuade the vendor to include Totalview support. InfiniPath MPI supports text based gdb and pathdb, see mpirun --help for details on how to use these debuggers on Updraft.
  1. If you run Totalview for the first time, you have to also specify a full path to the TV Server, tdvsvr. To do that, modify Launch Strings dialog in Menu - Preferences to:
    • /uufs/arches/sys/totalview/bin/tdvsvr

instead of just plain

    • tdvsvr
  1. Click go to run the code. After a little while, you will get a dialog box saying: Process executable.exe is a parallel job. Do you want to stop now? Click Yes, set some breakpoints and start debugging.


Intel Trace Analyzer (ITA, fka Vampir) is a graphical profiling tool that enables user to analyze time the program spends in calculation, IO, communication,... It enables the user to quickly focus at the appropriate level of detail by zooming into an arbitrary part of the trace and by selecting interesting processes, events, and communications operations.

The analyzed program must generate a trace file, .vtf, This can be done by an open source package TAU (Tuning and Analysis Utilities). User has to link the code with TAU, run it to produce TAU trace files and then convert them to Vampir format. These files are then read by ITA for performance analysis.

Please note that the trace files can become quite large (gigabytes). Therefore, we recommend to run only a small section of the code with tracing enabled, e.g. single iteration, time step, etc. If that is not possible in your code, please, contact CHPC for options how to enable/disable tracing via code instrumentation.

Location of TAU and the Intel Trace Analyzer on CHPC machines and instructions how to use them are below.

Further references:

MPI profiling involves two steps. First one must produce instrumented binary that allows timing information collection. On our systems, we use the TAU package for this purpose. Then one runs the instrumented executable and produces trace files that contain the timing information. Finally, these trace files are viewed in a program that lets user analyze the timing information. At CHPC, we use Intel Trace Analyzer for this purpose.

Binary is instrumented by using special TAU compiler wrappers instead of standard MPI compilers. In order to use these wrappers, one has to first source in the TAU environment.

  • source /uufs/arches/sys/pkg/tau/std/etc/tau.csh for csh/tcsh
  • source /uufs/arches/sys/pkg/tau/std/etc/ for sh/ksh/bash

Then one can either modify the program's Makefile and replace the default compilers with TAU compiler wrappers. Note that we are including TAU's Makefile to define all the TAU make variables.

  • TAUROOTDIR = uufs/arches/sys/pkg/tau/2.15
  • include $(TAUROOTDIR)/include/Makefile
  • F90 = $(TAU_COMPILER) pathf90
  • CC = $(TAU_COMPILER) gcc

Alternatively, one can compile directly using TAU compiler wrapper scripts,,,

Once the executable is compiled, run it to produce the trace files. Note that since this is an MPI program, it must be run with the mpirun command, e.g.:

/uufs/arches/sys/pkg/mpich/std/bin/mpirun -np 4 -machinefile $PBS_NODEFILE ./executable

Upon finishing, there should be numerous files named tautrace.* and events.* in the run directory. These are the trace files in TAU format.

We have to convert these files to the Vampir trace file (vtf) format. This is done in two steps.

  1. tau_merge tautrace.*.trc myprogram.trc - optionally add -n to break a stuck session
  2. tau2vtf myprogram.trc tau.edf myprogram.vtf

Trace files are viewed with the Trace Analyzer. In order to use ITA, source a script in the .cshrc, .tcshrc or .bashrc that sets paths and license information:

  • source /uufs/arches/sys/pkg/ita/std/etc/ita.csh (for csh/tcsh)
  • source /uufs/arches/sys/pkg/ita/std/etc/ (for ksh/bash)

Then open the Trace Analyzer with the trace file:

traceanalyzer executable.vtf

Finally, here is an example of a Makefile entry for compiling MPEVB extension to DLPOLY molecular dynamics package with TAU.Initial definitions remain unchanged

TAUROOTDIR = /uufs/arches/sys/pkg/tau/std
include $(TAUROOTDIR)/include/Makefile
F90 = $(TAU_COMPILER) pathf90

lots of other stuff
arches-pa-tau : dpp cp $(MPI_DIR)/include/mpif.h mpif.h $(MAKE) \ LDFLAGS="-O3 -OPT:Ofast -OPT:Olimit=0 -L$(FFTW_LIBRARY)/lib -lfftw3 " \ FFLAGS="-c -O3 -OPT:Ofast -OPT:Olimit=0 " \ CPFLAGS="-D$(STRESS) -DMPI -DFFTW -P -D'pointer=integer*8' " \ TIMER="" EX=$(EX) BINROOT=$(BINROOT) $(TYPE)
Last Modified: October 06, 2008 @ 21:07:47

  • No labels