In this tutorial you will learn how to compile a basic MPI code on the CHPC clusters, as well as basic batch submission and user environment set up. In order to complete this tutorial, you will need an account with CHPC. If you don't have an account, see the Getting Started at the CHPC guide.
Logging into the clusters
The CHPC has two main cluster as of February 2014: ember and kingspeak . For this tutorial, any of the clusters will do, but this tutorial will assume you are using
ember. Login using an ssh client of your choice:
[user@wherever:~]$ ssh firstname.lastname@example.org
Make sure to replace the username with your own UNID, and if you want a different cluster, replace it with the appropriate cluster name. When you set up your account with CHPC, you selected a default shell, either bash or tcsh. If you forgot which shell you selected, you can find out using the
[u0123456@ember1:~]$ echo $SHELL
This will give something like
/bin/tcsh. The syntax for scripting each of these shells is different, so make sure you know which one you are using! There are also many good resources on the internet for learning shell scripting. Associated with each of these shells is a configuration file, called
.bashrc. The CHPC has specific configuration files that are essential for setting up cluster specific environments. These configuration files should have been placed in your home directory when you received an account with CHPC, but in case it didn't, you'll need to download it. The CHPC also releases new configuration scripts periodically; it is the user's responsibility to obtain new configs as they are released. The configurations are located here:
httpAll CHPC Documentation has been moved: https://www.chpc.utah.edu/docs/manuals/getting_started/code/chpc.tcshrchttp://www.chpc.utah.edu/docs/manuals/getting_started/code/chpc.bashrc
An easy way to download files to CHPC systems is to use the
wget command. If you are downloading the chpc.bashrc, the command would be:
[u0123456@ember1:~]$ wget http://www.chpc.utah.edu/docs/manuals/getting_started/code/chpc.bashrc
The command will show an output with download progress and other information. Once you download the file, you'll need to rename it to the appropriate name (ie, .bashrc or .tcshrc), then disconnect and reconnect to the cluster. You can also use the source command or start a new shell.
Note: the rest of this tutorial assumes you are using bash for your shell.
To get started, execute the following:
[u0123456@ember1:~]$ source /uufs/ember.arches/sys/pkg/mvapich2/1.7i/etc/mvapich2.sh
Note that this script is for the bash shell. If you are using tcsh, the extension of the script will be
mvapich2.csh). Also note that you've sourced mvapich2 for ember; generally OpenMPI provides better performance on ember for serious codes, but for this purpose mvapich2 will work fine. You will want to consult the different cluster guides for performance recommendations, or experiment with your own codes to see what MPI setups provide the best performance. Once you've sourced MPI, you should now be able to execute
[u0123456@ember1:~]$ mpicc -v
mpicc for MVAPICH2 version 1.7
icc version 14.0.0 (gcc version 4.4.7 compatibility)
If you have your own source code to test, you may want to use that, but in case you don't, here is a simple hello world script:
You can download it using
wget, or you can copy and paste it into your favorite editor. Once you have the file, you can compile it using:
[u0123456@ember1:~]$ mpicc hello_1.c -o hello_ember
If you received any warnings, ignore them. If you have an error, you probably copied the program wrong. Important note: It's good practice to compile programs on the interactive nodes of the cluster you'll be working with, and distinguish them using different names (ie,
hello_ember, hello_kingspeak). Generic builds typically suffer from lower performance than builds specific to a particular cluster, primarily due to different hardware configurations. Again, visit the cluster user guides for more information on best practices.
Interactive job submission
Now that you have your executable, you're ready to execute the job on the cluster. There are two ways to do this, either through an interactive session or through a batch script. For now you will use an interactive session. Interactive sessions are appropriate for doing analysis with programs with GUIs or long compile sessions on the cluster, where running on the standard interactive nodes would be inappropriate.
IMPORTANT WARNING: Never execute a large MPI job on the main interactive nodes (the ones you log into initially). These nodes are shared by all CHPC users for basic work, and heavy load tasks will degrade performance for everyone. Tasks that exceed 15 minutes under heavy load will be arbitrarily terminated by CHPC systems.
Begin by submitting a request to log onto the cluster nodes with qsub. Depending on cluster loads, you may or may not have to wait for the job to start. Generally, it will be easier to start an interactive session on sanddunearch, because this cluster is more underutilized (as of this writing). The command:
[u0123456@ember1:~]$ qsub -I -l nodes=2:ppn=12,walltime=0:05:00
This will request an interactive session on ember (
-I), with 2 nodes (
-l nodes=2) at 12 processor per node (
:ppn=12), for 5 minutes (
walltime=0:05:00). Note that you don't have to necessarily specify ppn, but it's generally good practice to do so. Also note that different clusters have different number of ppn; if you submit a job on a cluster with the wrong processors per node, the job will never run. For ember, ppn=12; kingspeak, ppn=16. Adjust this as necessary for your jobs.
Once the interactive session starts, running the job is quite simple. Navigate to the directory where your program is stored, and execute the following commands:
[u0123456@em123:~]$ source /uufs/ember.arches/sys/pkg/mvapich2/1.7i/etc/mvapich2.sh
You may need to put in your password once or twice to allow connection to the nodes and confirm some RSA keys. Once you get the command prompt back, execute this command:
[u0123456@em123:~]$ mpirun -hostfile $PBS_NODEFILE -n 24 $HOME/hello_ember
Make sure to change the path for your hello world program if you put it somewhere besides your home directory (e.g.,
$HOME/test/hello_ember). Also change the
-n flag to reflect the number of processors you will be running on
. If everything goes well, you should see something like this:
You should have the same number of "Hello world" lines as you have processors. Finally, exit the interactive session by using the command
Batch job submission
[u0123456@ember1:~]$ qsub testjob
The output upon successful submission will give the job number and an internal moniker for the job. In order to view the job in the queue, you can use the following commands:
showq - shows all jobs in the queue and current metrics
showq | grep u0123456 - shows all jobs for UNID u0123456 (use your own!)
showstart 112233 - gives an estimate for when a job will start
checkjob 112233 - gives useful information about a job
Note that many of these commands may not be useful for this job if it begins running right away. For longer jobs you may run in the future, these will become very useful. If your job ran without error, you should have three output files:
These output files have the format
name.(o/e)number and correspond to the standard output and error produced by linux programs. If you use
cat on test.out, you'll see output like we saw earlier in the interactive session. If the program ran with an error, or something gets written to output by the batch script, then those will appear in the numbered output files. If you have problems with your program during a batch session, you should look there
This is the end of this tutorial. If you have trouble with this tutorial or anything else on the CHPC systems, contact email@example.com. You may also want to consider attending the presentations which are held each semester by CHPC staff members, spanning a variety of topics such as Linux Basics, Parallel Programming, and Systems Overviews.