The roundRobin program appeared to be correctly running. But in general, because of the complexity of most parallel programs, core dumps or simply incorrect results are common. TotalView is the premier parallel dubugging tool available today; it lets you examine your parallel program in detail and identify flaw(s) if any.
In this section, we will explore the basic functionality of TotalView using the roundRobin program. In the next section, we'll use TV to debug the erroneous Parallel Diffusion program.
TotalView is a tool which can be used to interactively debug parallel programs. To use TotalView, the parallel program must be run interactively and not in batch mode, i.e. on nodes gotten through qsub -I.
Related Info:
The license also is restricted to 2 concurrent users at any given time.
This does not affect workshop attendees, because
we have an unlimited user demo license for the workshop.
Let us see how to use TotalView to debug a parallel program while working on interactive nodes provided by PBS. When you invoke mpirun , you will need to append the -tv option to it before specifying your program's name. For example, to debug the roundRobin program, you'll have to type:
Exercise: Do this!
[agopu@bc81 RoundRobin]$ cd ~/MPI_Tutorial/RoundRobin/ [agopu@bc81 RoundRobin]$ lamboot $PBS_NODEFILE [agopu@bc81 RoundRobin]$ mpirun C -tv roundRobin [agopu@bc81 RoundRobin]$ lamhalt
Note to MPICH Users: If you are an MPICH user, please see important note about use of TV with MPICH code on AVIDD in the Compiling and Running Parallel jobs using MPICH on AVIDD section of this workshop.
As a start, you should see something like this on the console:
Linux x86 TotalView 6.6.0-2 Copyright 1999-2004 by Etnus, LLC. ALL RIGHTS RESERVED. . . . Reading symbols for process 1, executing "roundRobin" Library /N/u/agopu/MPI_Tutorial/RoundRobin/roundRobin, with 1 asects, was linked at 0x08048000, and initially loaded at 0x90000000 . . . Skimming 50430 bytes of DWARF '.debug_info' symbols from '/N/u/agopu/MPI_Tutorial/RoundRobin/roundRobin'...done . . . . . .Also two windows should open up:
The TotalView root [main] window where process information and the like are shown.
The TotalView process [debug] window showing a stack trace, program source code, action points, and such. Important features of the Process Window are:
Just below the bunch of buttons is an identifier bar. It identifies the current process and its process number and the like. For example, in the sample window shown below, the bar tells us that the information shown currently (again, in the process window) pertain to process 0 and the program name is roundRobin, and so forth. You should be able to cross-reference the process number with the Root Window.
Among the buttons, near the upper left-hand corner is a pull-down labeled Group (Control) . You can toggle this to go between Group and Process control. Now what is that supposed to mean?
If Group (Control) is selected, then when you do any operation (like Next, Step, Halt, etc.)on the process window, it is applicable to all the processes running under mpirun's purview (or mpiexec's purview if you are using MPICH).
If instead, Process is chosen, then any operation you do is applicable to only the current process i.e. the process indicated by the identifier bar (as explained in the above bullet).
To the right are buttons such as Go and Step which move us or step us through the program (Usual debugger functionality)
Below the identifier bar, on the left is the Stack Trace which shows all procedures (functions and the like) in the stack. To its right is the Stack Frame displaying local variables and registers for the frame, for the current process. Nothing interesting at this time because the main program has not started for this processor or for any of the others.
Below the stack trace and stack frame is the Source Pane displaying the roundRobin.c source code.
OK, all those details said, the two TotalView windows should look similar to what's shown below
Note: break points are called action points by TotalView's manuals


Exercise: Do this!
You should pull down File -> Preferences... -> Action Points; where it says "When breakpoint hit, stop:" select Process. This will prevent one or more processes from stopping prior to a breakpoint.
Within the process window, make sure Group (Control) is selected in the drop down menu on the top left corner; left-click on the Go button. You'll trigger parallel processes to be started on your compute nodes and TV will ask if you'd like to stop the parallel code.

Click Yes. After a few moments the root window's contents will change: 4 attached processes to which TV has attached itself will be shown with status code T implying stopped state. The process window will also change and TV will stop at its default break point as shown below:


Exercise: Do this; and try to follow all the steps below!
Recall that the parallel code execution stopped at a default break point. Click on main, inside the Stack Trace pane; the roundRobin.c source should be shown in the source window once again. Scroll down to the line shown in red below (line 77 if code is unchanged):
. . . /* If I'm processor proc, then I want to send */ /* to processor proc + 1: */
if (myrank == proc)
. . .
... and left-click right on the line number. A break point indicated by a STOP symbol
should show up; this should be reflected in the Action Point pane as well.

Now, watch the Root Window carefully and hit the Go button (on the Process window): All four roundRobin processes will momentarily show a status R implying running state, then finally status B for halting at breakpoint we had created. All four processes ran to the breakpoint because we are still toggled to the Group (Control) mode.

Now look at the local variables in the Stack Frame: myrank is 0 and numProcs equals 4. A number beginning with 0x is in hexadecimal (base 16); the number to the right, in parenthesis, is the decimal (base 10) equivalent.
Now, in the Root Window, dive onto process 2. i.e., right click on process 2 and select Dive. Then look at the Process Window: the identifier bar should display details for process 2, and among the local variables in the Stack Frame myrank should be 1. You may dive into the other two processes and watch the myrank variable (among others) change.
Another way of dealing with individual process details (instead of diving into them from the Root window) is to use the P+ and P- buttons on the top right corner of the process window. Click on one of those two buttons and see how details change within the process window just as they did if you dived into a particular process.
Few more things you should try:
Stepping through code using Next and also flow of parallel code based on process rank
Put in two more break-points at the lines shown in red below (lines 80 and 93 on unchanged code):
dest = proc + 1;
MPI_Send( msg,
. . .
. . .
src = proc;
MPI_Recv( msg,
Toggle to Process mode using the drop-down menu on the top-left corner.
Dive into process 0, and hit Go; Notice where the program stops (should on the MPI_Send( msg, line and also check out variables myrank, src and dest.
Then, dive into process 1 (from root window or by hitting P+ till you see process 1 details), hit Next once and then two more times; Notice where the program stops at each point (after 3 nexts, it should be on the MPI_Recv( msg, line; once again check out variables myrank, src and dest.
You could repeat this exercise for processes 2 and 3. The flow of your parallel code based on process rank should become obvious by now.
OK, remove all three breakpoints by clicking on the STOP symbol.
Set a new breakpoint on the line shown in red below:
/* Close the MPI API: */ MPI_Finalize();Toggle back to Group (Control) mode and hit Go.
Using Laminate to view variables of multiple processes:
You should be on the breakpoint we defined in the last part of the previous bullet. Now dive into the myrank variable (double click on myrank within the Stack Frame pane or within the source code).
A new window should be opened up and you should see the value of myrank for the current process. But what if you want to view the value of a variable in all processes?
To accomplish that, select View -> Laminate -> Process on the variable window; now you should be able to see each process' rank at the same time.
If you are ready to proceed to the next section, then close the Variable window and read on...
Finally, make sure you are still toggled to Group (Control) ; Hit Go; On the Root window, all four processes should momentarily go into R (run) status. And finally the four processes shouuld switch to X state. I.e. they have terminated; you should see something similar to this on the console (where you started the TotalView mpirun (mpiexec) run).
. . . . . . Reading symbols for process 4, executing "roundRobin" The number of processors in this run is 4. Processor 0 has sent the message. Processor 1 has received the message. . . . Processor 0 has received and checked the message. Program terminating normally. Processor 3 has sent the message. [0] Intel Trace Collector INFO: Writing tracefile roundRobin.stf in /N/u/agopu/MPI_Tutorial/RoundRobin
| Previous: Profiling parallelDiffusion using ITA | Up: Table of Contents | Next: TotalView to Debug parallelDiffusion |
|---|