Skip to end of metadata
Go to start of metadata

Introduction

In this tutorial you will learn how to compile a basic MPI code on the CHPC clusters, as well as basic batch submission and user environment set up. In order to complete this tutorial, you will need an account with CHPC. If you don't have an account, see the Getting Started at the CHPC guide. 

Logging into the clusters

The CHPC has two main cluster as of February 2014: ember and kingspeak . For this tutorial, any of the clusters will do, but this tutorial will assume you are using ember. Login using an ssh client of your choice:

[user@wherever:~]$ ssh u0123456@ember.chpc.utah.edu

Make sure to replace the username with your own UNID, and if you want a different cluster, replace it with the appropriate cluster name. When you set up your account with CHPC, you selected a default shell, either bash or tcsh. If you forgot which shell you selected, you can find out using the SHELL variable:

[u0123456@ember1:~]$ echo $SHELL

This will give something like /bin/bash or /bin/tcsh. The syntax for scripting each of these shells is different, so make sure you know which one you are using! There are also many good resources on the internet for learning shell scripting. Associated with each of these shells is a configuration file, called .tcshrc and .bashrc. The CHPC has specific configuration files that are essential for setting up cluster specific environments. These configuration files should have been placed in your home directory when you received an account with CHPC, but in case it didn't, you'll need to download it. The CHPC also releases new configuration scripts periodically; it is the user's responsibility to obtain new configs as they are released. The configurations are located here:

http://www.chpc.utah.edu/docs/manuals/getting_started/code/chpc.tcshrc
http://www.chpc.utah.edu/docs/manuals/getting_started/code/chpc.bashrc

An easy way to download files to CHPC systems is to use the wget command. If you are downloading the chpc.bashrc, the command would be:

[u0123456@ember1:~]$ wget http://www.chpc.utah.edu/docs/manuals/getting_started/code/chpc.bashrc

The command will show an output with download progress and other information. Once you download the file, you'll need to rename it to the appropriate name (ie, .bashrc or .tcshrc), then disconnect and reconnect to the cluster. You can also use the source command or start a new shell.

Note: the rest of this tutorial assumes you are using bash for your shell.

Sourcing MPI

To get started, execute the following:

[u0123456@ember1:~]$ source /uufs/ember.arches/sys/pkg/mvapich2/1.7i/etc/mvapich2.sh

Note that this script is for the bash shell. If you are using tcsh, the extension of the script will be .csh (ie, mvapich2.csh). Also note that you've sourced mvapich2 for ember; generally OpenMPI provides better performance on ember for serious codes, but for this purpose mvapich2 will work fine. You will want to consult the different cluster guides for performance recommendations, or experiment with your own codes to see what MPI setups provide the best performance. Once you've sourced MPI, you should now be able to execute mpicc:

[u0123456@ember1:~]$ mpicc -v

mpicc for MVAPICH2 version 1.7
icc version 14.0.0 (gcc version 4.4.7 compatibility)

Hello world

If you have your own source code to test, you may want to use that, but in case you don't, here is a simple hello world script:

http://chpc.utah.edu/docs/manuals/getting_started/code/hello_1.c

You can download it using wget, or you can copy and paste it into your favorite editor. Once you have the file, you can compile it using:

[u0123456@ember1:~]$ mpicc hello_1.c -o hello_ember

If you received any warnings, ignore them. If you have an error, you probably copied the program wrong. Important note: It's good practice to compile programs on the interactive nodes of the cluster you'll be working with, and distinguish them using different names (ie, hello_ember, hello_kingspeak). Generic builds typically suffer from lower performance than builds specific to a particular cluster, primarily due to different hardware configurations. Again, visit the cluster user guides for more information on best practices.

Interactive job submission

Now that you have your executable, you're ready to execute the job on the cluster. There are two ways to do this, either through an interactive session or through a batch script. For now you will use an interactive session. Interactive sessions are appropriate for doing analysis with programs with GUIs or long compile sessions on the cluster, where running on the standard interactive nodes would be inappropriate. 

IMPORTANT WARNING: Never execute a large MPI job on the main interactive nodes (the ones you log into initially). These nodes are shared by all CHPC users for basic work, and heavy load tasks will degrade performance for everyone. Tasks that exceed 15 minutes under heavy load will be arbitrarily terminated by CHPC systems.

Begin by submitting a request to log onto the cluster nodes with qsub. Depending on cluster loads, you may or may not have to wait for the job to start. Generally, it will be easier to start an interactive session on sanddunearch, because this cluster is more underutilized (as of this writing). The command:

[u0123456@ember1:~]$ qsub -I -l nodes=2:ppn=12,walltime=0:05:00

This will request an interactive session on ember (-I), with 2 nodes (-l nodes=2) at 12 processor per node (:ppn=12), for 5 minutes (walltime=0:05:00). Note that you don't have to necessarily specify ppn, but it's generally good practice to do so. Also note that different clusters have different number of ppn; if you submit a job on a cluster with the wrong processors per node, the job will never run. For ember, ppn=12; kingspeak, ppn=16. Adjust this as necessary for your jobs. 

Once the interactive session starts, running the job is quite simple. Navigate to the directory where your program is stored, and execute the following commands:

[u0123456@em123:~]$ source /uufs/ember.arches/sys/pkg/mvapich2/1.7i/etc/mvapich2.sh

You may need to put in your password once or twice to allow connection to the nodes and confirm some RSA keys. Once you get the command prompt back, execute this command:

[u0123456@em123:~]$ mpirun -hostfile $PBS_NODEFILE -n 24 $HOME/hello_ember

Make sure to change the path for your hello world program if you put it somewhere besides your home directory (e.g., $HOME/test/hello_ember). Also change the -n flag to reflect the number of processors you will be running on. If everything goes well, you should see something like this:

Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
[u0123456@em123:~]$ 

You should have the same number of "Hello world" lines as you have processors. Finally, exit the interactive session by using the command exit.

Batch job submission

With your favorite editor, make a new file and call it testjob. Copy and paste the following simple script into the file:

#PBS -S /bin/bash
#PBS -l nodes=2:ppn=12,walltime=0:02:00
#PBS -M user@chpc.utah.edu
#PBS -N test
#PBS -A account-name

cd $HOME
source /uufs/ember.arches/sys/pkg/mvapich2/1.7i/etc/mvapich2.sh mpirun -np 24 -hostfile $PBS_NODEFILE $HOME/hello_ember > test.out


All of the #PBS comments are directives for job control, just like the ones used in qsub. If you're using a different cluster (kingspeak), make sure to replace paths with the ones that point at the correct cluster implementation of mvapich2. Also make sure to change the number for ppn and the -np flag, as well as the email that the script points to. To execute the script on the cluster:

[u0123456@ember1:~]$ qsub testjob
112233.emrm.opib.privatearch.arches

The output upon successful submission will give the job number and an internal moniker for the job. In order to view the job in the queue, you can use the following commands:

showq - shows all jobs in the queue and current metrics
showq | grep u0123456 - shows all jobs for UNID u0123456 (use your own!) 
showstart 112233 - gives an estimate for when a job will start
checkjob 112233 - gives useful information about a job

Note that many of these commands may not be useful for this job if it begins running right away. For longer jobs you may run in the future, these will become very useful. If your job ran without error, you should have three output files:

[u0123456@em123:~]$ ls

test.o123456 hello_ember
test.e123456 hello_1.c
test.out

These output files have the format name.(o/e)number and correspond to the standard output and error produced by linux programs. If you use cat on test.out, you'll see output like we saw earlier in the interactive session. If the program ran with an error, or something gets written to output by the batch script, then those will appear in the numbered output files. If you have problems with your program during a batch session, you should look there

[u0123456@em123:~]$ cat test.out
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world

This is the end of this tutorial. If you have trouble with this tutorial or anything else on the CHPC systems, contact issues@chpc.utah.edu. You may also want to consider attending the presentations which are held each semester by CHPC staff members, spanning a variety of topics such as Linux Basics, Parallel Programming, and Systems Overviews.

  • No labels