In your RoundRobin directory, typing ls will show you a file named roundRobin.stf.
Exercise: Do this!
[agopu@bc81 agopu]$ cd ~/MPI_Tutorial/RoundRobin/ [agopu@bc81 RoundRobin]$ ls -l roundRobin.s* -rw------- 1 agopu hpc 10117 Feb 7 12:03 roundRobin.stf -rw------- 1 agopu hpc 1335 Feb 7 12:03 roundRobin.stf.dcl -rw------- 1 agopu hpc 32728 Feb 7 12:03 roundRobin.stf.frm . . . -rw------- 1 agopu hpc 32728 Feb 7 12:03 roundRobin.stf.sts
These trace files were created automatically by Intel Trace Collector when the roundRobin program was executed as part of an exercise in the RoundRobin - Simple Message Passing Illustrated section.
No modification of the source code roundRobin.c was required. The linker flag -lVT to the libVT.a Intel Trace Collector library in the Makefile did the needful.
An important tool that enables the parallel programmer to view the lines of communication among processors is the trace display tool Intel Trace Analyzer (ITA), formerly called Vampir Trace. ITA can be used to examine roundRobin.stf.
Exercise: Do this!
At the shell prompt (on interactive nodes you got through qsub -I ), type:
[agopu@bc81 RoundRobin]$ traceanalyzer roundRobin.stf &
The basic elements of the initial timeline are:
The horizontal axis is time.
Each horizontal bar corresponds to a process or a set of processes. In our case, since we were using two nodes and two processors in each of those nodes, ITA merges the timelines on a per node basis. At the left margin they are clearly listed - notice how processes 0 and 2, which ran on node bc43, are represented on the second bar. Soon, we'll show how it's possible to look at each process in finer detail.
Red segments show execution of MPI functions. Green segments show execution of the routines of the program. Neither the overhead imposed by the ITC/A tracing application itself nor the messages exchanged between processors is not shown in this initial timeline.
Exercise: Do this!
Next, right-click on the initial timeline frame and select Load -> Whole Trace . This will open up another summary chart - you can see how much time MPI calls took and so forth for the entire application. Shown below is an example of what you could expect to see for the roundRobin program:
Exercise: Do this! Close the initial timeline and the summary chart windows. Then pull down the Global Displays menu and select Timeline . This will open up a more detailed timeline than what you saw initially.
The horizontal axis is time. Each horizontal bar corresponds to an individual process. Once again, red segments show execution of MPI functions. Green segments show execution of the routines of the program. The overhead imposed by the ITC/A tracing application itself is not shown in this timeline, since it's very negligible. The MPI messages exchanged between processors is shown as solid black lines. Right-click on the detailed timeline window and select Component -> Parallelism Display . An additional section is added to the timeline showing how effectively parallel our program was at various times - it's obvious that till the very end, our program's parallelism was not so good since it was waiting for a blocking call (Send/Receive) to complete. Lets take a closer look at processor 1. The initial green segment is when processor 1 is preparing the message. Then processor 1 calls the MPI_Send function to transmit this message to processor 2. The MPI_Send function is a "blocking" function; this means that processor 1 cannot proceed until the MPI_Send is completed. Once the MPI_Send completes, processor 1 executes a few more lines of the program as can be seen by the thin green line following the MPI_Send. Then processor 1 calls the MPI_Recv function to receive the message back from processor 0. Note that MPI_Recv is also a blocking function. Once processor 1 receives the message, it is free to complete the program (final green stripe) and terminate. The MPI API does make available non-blocking send and receive functions, but we will not consider these in this beginner's tutorial. Exercise: Do this! To analyze the time spent on various activities by roundRobin in a different format (a bit more detail) pull down the Global Displays menu and select Activity Chart . You now see histograms of the time spent in each activity by each processor. If you prefer pie-charts instead of the histograms, then you can right-click on this activity display, select Mode -> Histogram . The same data is displayed as a bunch of pie-charts this time around. Next, right-click on the activity display, and click on Select -> All Symbols . Now the histograms show finer detail by comparing all states of the program (Some of the states are negligible / nonexistent!). You will learn later how to use Intel Trace Collector/Analyzer's instrumentation to define additional program states in order to increase your resolution of program activity. A sample ITA activity chart associated with the trace file roundRobin.stf is shown below:
The histogram view (Activity Chart) provides an overall profile of how each processor spends its time. The bars are in order from left to right as is the function legend from top to bottom. Because in this elementary roundRobin example there is very little computing, the processors spend most of their time executing MPI function calls. Only a small proportion of time is spent executing the application itself (User_Code).
Detailed ITA timeline
Individual Processor activity - ITA Activity Chart
| Previous: Tweaking Security Settings | Up: Table of Contents | Next: Profiling parallelPi |
|---|