Child pages
  • Totalview debugger

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

CHPC Software: Totalview Debugger

TotalView is a full-featured, source-level, multi-process, multi-thread graphical debugger f or C, C++, Fortran (77 and 90), PGI, HPF, assembler, and mixed source/assembler codes. Totalview provides industry standard support for parallel and distributed computing, e.g. MPI, PVM, and IBM's Parallel Operating Environment (POE).

More information:

  RogueWave, formerly TotalviewTech, manufacturer of Totalview Rogue Wave Software
  Online Documentation from TotalviewTech TotalView online documentation
  Debugging with Totalview tutorial presentation (requires a video player like QuickTime, download may take some time). Video Tutorial on Totalview

Totalview at CHPC

Totalview is installed in /uufs/chpc.utah.edu/sys/pkg/totalview/std. This version runs on any Linux workstation with 64 bit OS. Both serial and parallel debugging (MPI, OpenMP) are supported.

Totalview supports most available compilers, including GNU, PGI, Pathscale and Intel.

In order to compile your code and run it in Totalview, follow these steps:

Serial code:

Serial code can be debugged on personal desktops or on the interactive nodes, provided it does not generate high load that can affect responsiveness to other users.

Follow these steps:

  1. Compile with -g:
    *    For GNU compilers:  g77 -g source_name.f -o executable_name

*  For PGI compilers:  pgf77 -g source_name.f -o executable_name

*   For Pathscale compilers: pathf90 -g source_name.f -o executable_name

*   For Intel compilers: ifort -g source_name.f -o executable_name

  1. Run the program in Totalview:
    * totalview ./executable_name

Parallel MPI/OpenMP code:

Parallel code can be debugged on personal desktops or on the interactive nodes using MPICH2. On the interactive nodes, be considerate to other users and don't run too many processes at the same time. In case of need for more CPUs in the debugging session (our Totalview license has 32 CPU limit), debug on the compute nodes.

Parallel debugging on personal desktops or on cluster interactive nodes:

In order to debug in parallel on interactive nodes, one has to use MPICH2 MPI implementation. This is because MPICH2 supports single node parallel program launch without need for rsh command, which is disabled on interactive nodes for security reasons.
Also, since interactive nodes have limited number of processors (2 to 8) and memory (2 to 16 GB), only parallel programs that need less processors and memory should be debugged on interactive nodes. Still, most bugs show up on any number of processors so we recommend debugging on the interactive nodes whenever possible since it's much easier than debugging on compute nodes. Note that you can debug more than 4 process parallel program on 4 physical CPUs, since debugging in general puts less load on the CPUs and as such more than one process can be executed at a single CPU at the time.

The debugging process on interactive nodes is as follows:

...

All CHPC Software Documentation has been moved: https://www.chpc.utah.edu/

...

documentation/

...

software/

...

Parallel debugging on compute nodes:

In order to debug on compute nodes, one has to run a job through the queue. This should be done only in extreme cases when the bug does not show up during a parallel MPI run on a single node or if it takes a very long time to reach the point of the bug. If the queue wait time is too long (there are no nodes available), please, contact CHPC at issues@chpc.utah.edu to set up a reservation for you so that you can get the nodes you need right away.

  1. Compile with -g:
    * mpicc -g source_name.c -o executable_name
    * mpif90 -g source_name.f -o executable_name
    * For parallel OpenMP application using PGI compilers and include -mp to invoke OpenMP directives support, -omp for Pathscale compilers or -openmp for Intel compilers.
    * For debugging parallel codes running over InfiniBand, compile with MVAPICH2:
    ** /uufs/sanddunearch.arches/sys/pkg/mvapich2/std/bin/mpicc -g source_name.f -o executable_name
    ** /uufs/sanddunearch.arches/sys/pkg/mvapich2/std/bin/mpif90 -g source_name.f -o executable_name
  2. Start interactive PBS session, e.g.:
    * qsub -I -X -l nodes=4:ppn=4,walltime=2:00:00
  3. Start Totalview:
    #* totalview &
    # Totalview opens New Program window. In this window, fill in the following:
    #* In tab Program, field Program, put in the executable program name
    #* In tab Parallel, field Parallel System, choose MPICH2, and for Tasks set the number of processors you want to run on
    #* Click OK to load the program into Totalview. After a little while, you will get a dialog box saying: "Process executable.exe is a parallel job. Do you want to stop now?" Click Yes, set some breakpoints and start debugging.

If you run Totalview for the first time, you may have to also specify a full path to the TV Server, tdvsvr. To do that, modify Launch Strings dialog in Menu - Preferences to:
* /uufs/index.php

This includes a searchable database found: https://www.chpc.utah.edu/syssoftware/totalviewchpc/bin/tdvsvr

instead of just plain
* tdvsvr
Click go to run the code. After a little while, you will get a dialog box saying: "Process executable.exe is a parallel job. Do you want to stop now?" Click Yes, set some breakpoints and start debugging.

Enabling ReplayEngine:

CHPC also owns 16 CPU license for the ReplayEngine plugin for Totalview, which enables one to step backwards in their debugging session. In order to use ReplayEngine, in the New Program dialog after starting Totalview, check the "Enable ReplayEngine" checkbox.

Enabling memory debugging:

Totalview has built in memory debugging options. To enable memory debugging, in the New Program dialog after starting Totalview, check the "Enable memory debugging" checkbox. This option checks for array index mismatches, allocation/deallocation problems and many other common memory bugs and is useful when the program crashes for no apparent reason or when the program exits with segmentation fault error.