Parallel (MPI) Jobs on Big Red

What MPI libraries are available to me through softenv?

To display what MPI lirbaries are available on Big Red, use the softenv command and grep for "mpi". In particular look for keywords (for example, +openmpi-1.2-mx-ibm-64) that begin with either +openmpi-XYZ or +mpich-XYZ.

Right now, we have two MPI libraries installed on Big Red: the Open MPI library and the MPICH library. So, among other keys that are listed, you should be able to see a few packages names as follows: openmpi-1.2-mx-ibm-64, openmpi-1.1.1-xlc-8.0-64, +mpich-mx-ibm-32, and mpich-mx-ibm-64.

ag@BigRed:~/> softenv | grep mpi
    . . .
    +mpich-mx-ibm-32               MPICH MX xlc/xlf 32bit
    +mpich-mx-ibm-64               MPICH MX xlc/xlf 64bit
    . . . 
    +openmpi-1.1.1-xlc-8.0-32      Open MPI 1.1.1 STATIC, XL-8.0 32-bit
    +openmpi-1.1.1-xlc-8.0-64      Open MPI 1.1.1 STATIC, XL-8.0 64-bit
    +openmpi-1.2-mx-ibm-64         Open MPI 1.2 MX STATIC IBM C/F 8/10.1 64-bit
    . . .

The key openmpi-1.2-mx-ibm-64, for example, indicates that it is an Open MPI library (version 1.2) that was compiled with the IBM XL compilers in 64-bit mode. Similarly, the mpich-mx-ibm-32, for example, indicates that it is an MPICH library that was compiled with IBM XL compiler in 32-bit and communicates with mx protocol. For more information on understanding softenv keys, refer to the section named Understanding MPI package names.


Submitting a parallel (Either Open MPI or MPICH-MX) job on Big Red

Follow the three steps shown below to submit parallel MPICH-MX jobs on Big Red.

Add the appropriate Open MPI or MPICH-MX key to softenv

Pick an MPI library you wish to use, and the appropriate softenv key (i.e 64-bit or 32-bit, etc.). We will illustrate the use of a 64-bit IBM XL-compiled Open MPI library in the example below: So .. for example if you want to use +openmpi-1.2-mx-ibm-64, then your ~/.soft file should look similar to this:

IU Users TeraGrid Users
+openmpi-1.2-mx-ibm-64
@bigred
+openmpi-1.2-mx-ibm-64
@teragrid-basic
@globus-4.0
@teragrid-dev

If you are not familiar with the softenv system, please refer to Softenv section in the Big Red primer, or type man softenv-intro on Big Red.

Compile your MPI code

For example, to compile a 32-bit parallel C program using Open MPI 1.1.1 (assuming you have +openmpi-1.1.1-xlc-8.0-32 in your ~/.soft file):

ag@BigRed:~/MPI_source_code> mpicc -q32 -o myprog myprog.c

Or, to compile a 64-bit parallel Fortran77 program using MPICH-MX (assuming you have +mpich-mx-ibm-64 in your ~/.soft file):

ag@BigRed:~/MPI_source_code> mpif77 -q64 -o myprog myprog.f

Important Note: We strongly recommend the use of either the -q32 or the -q64 switch, depending on whether your code is 32 or 64-bit respectively, when you compile and link your program. We also recommend that you read the warning below about the OBJECT_MODE environment variable.

Submit your parallel job

There are different ways to submit a parallel job; all of them in turn will use the mpirun program; Choose one of the options list below, depending on what is applicable to your application!

The paralleljob option (&#;1 below) is for simple MPI applications; option &#;2 is for testing purposes ONLY; and option &#;3 below that uses an LL script is the method you could use if you need advanced options of the mpirun program used to run parallel jobs (for advanced production runs).

  1. Submitting parallel jobs using paralleljob script (for simple application)

    On Big Red, a script named paralleljob provides a convenient method for submitting some parallel (multiple-processor) programs to the LoadLeveler batching and queuing system. Suitable programs must consist of just one executable file (in contrast to some master/worker programs in which the master and workers are different executable files). Also, the script is designed so that programs can be submitted as jobs by prefixing the command-line with the word paralleljob. The general form of the command is:

    pw@BigRed:~/MPI_source_code>  paralleljob ./hello

    For more information on the paralleljob script, check out the detailed Using the paralleljob command to submit jobs page.

  2. Submitting parallel jobs interactively (for testing purposes ONLY)

    If you are trying to test your code, then submitting jobs interactively can be more useful than submitting batch jobs and waiting for their output files (and such).

    Big Red has 4 nodes (b509 through b512) that are available outside of the LoadLeveler submission system.. nodes that can be used to run test jobs interactively without submitting a batch job. To use those nodes, you will need to create a machine file manually (using your favorite editor -- vi or pico or emacs or what have you?!):

    ag@BigRed:~/MPI_source_code> cat mfile
    b509
    b510
    b509
    b510
    b509
    b510
    b509
    b510

    Then use the machine file to run your parallel job:

    ag@BigRed:~/MPI_source_code> mpirun -np 8 -machinefile mfile ./hello
     ...
    some output from your parallel program
    ...

    Important Note about interactive nodes: The interactive nodes mentioned above are open to all users; this means if someone else is using those nodes to test an MPI job then you'll likely get an MX error similar to this:

    MX:Aborting
    MX:s9c4b7:send req(already completed):req status 8:Remote endpoint is closed
    If you encounter such a situation, please wait for little while and try again. If the problem persists, then email the system administrators.
  3. Submitting parallel jobs using a LoadLeveller script (Advanced users; for production runs)

    To submit a paralle job that runs your MPI program, edit sample LoadLeveller script shown below (or create your own) to change the number of nodes and tasks, and the output/error files, etc. Then use llsubmit to submit your job.

    pw@BigRed:~/MPI_source_code> cat submit_parallel_job.sh
    #! /bin/bash -l
    
    ## LoadLeveller script to submit an MPI program named hello into
    ##  the MED queue
    
    # @ job_type = parallel
    # @ class = MED
    
    ## Teragrid users must use their project id instead of TG-FOO1234
    ##   in the line below
    ## IU users should use the word NONE instead of TG-FOO1234 in the line below
    
    # @ account_no = TG-FOO1234
    
    ## Request 2 node, 4 task MPI job, for 10 hours
    
    # @ node = 2
    # @ tasks_per_node = 4
    # @ wall_clock_limit = 10:00:00
    # @ notification = always
    # @ notify_user = <email_id>
    # @ environment=COPY_ALL;
    # @ output = hello.$(cluster).$(process).out
    # @ error = hello.$(cluster).$(process).err
    # @ queue
    
    ## Users should always cd into their execution directory due to
    ## a bug within LoadLeveler in dealing with the initialdir keyword.
    # cd <execution directory>
    
    cd ${HOME}/my_project
    
    ## Get machine list and write to /tmp/machinelist.$LOADL_STEP_ID
    
    llmachinelist
    
    ## Make sure number of tasks is <= to node * tasks_per_node
    # mpirun -np <number of tasks> \
    #  -machinefile /tmp/machinelist.$LOADL_STEP_ID ./hello
    
    mpirun -np 8 -machinefile /tmp/machinelist.$LOADL_STEP_ID ./hello
    
    # Clean up temporary machine list file created in previous step
    
    rm /tmp/machinelist.$LOADL_STEP_ID
    
          
    pw@BigRed:~/MPI_source_code> llsubmit submit_parallel_job.sh
    llsubmit: Processed command file through Submit Filter: "/home/load. . .
    llsubmit: The job "s10c2b5.dim.2577" has been submitted.

    If you are not familiar with the job submission on Big Red, please refer to Simple Jobs section in the Big Red primer.

Warning about $OBJECT_MODE environment variable

Note: As recommended above, if you use either -q32 or the -q64 switch during compile and link time, you should be fine and do not need to worry about the following; if that is the case, feel free to skip the rest of this section!

. . .

However, if you do not use those switches, then you should be aware of the $OBJECT_MODE environmental variable, because it also indicates to the IBM XL compilers whether an application, you might compile, is 32-bit or 64-bit.

Effect of OBJECT_MODE at compile time

  • Setting OBJECT_MODE to 64 will force the compiler to compile 64-bit object files. It is equivalent to explicitly specifying the -q64 switch to the compiler.
  • Not setting OBJECT_MODE or setting it to 32 will force the compiler to compile 32-bit object files. It is equivalent to explicitly specifying the -q32 switch to the compiler.

And as was mentioned earlier in this document, the best way to get around this issue is use of the -q32 or the -q64 switch to the compiler, depending on whether you want to compile a 32 or a 64-bit application respectively.

Effect of OBJECT_MODE at runtime (execution)

The OBJECT_MODE environment variable has no effect on the execution of a program.


Understanding MPI package names (softenv keys) on Big Red

Big Red is a 64-bit system that supports both 32-bit and 64-bit applications. However, 32-bit object files can not be linked with 64-bit ones.

The MPI software libraries are built under two assumption:

  • The compiler used to compile the package
  • The communication protocol the package operates on.

Both the Open MPI libraries and the MPICH libraries on the Big Red is compiled with IBM's XL compiler suite in both 32 and 64-bit modes. Examples of communication protocol include mx (Myrinet Express) and TCP/IP over the Myrinet high performance network.

Note: Applications using the mx protocol usually get better performance.