Skip to end of metadata
Go to start of metadata

Ember User Guide

Contents

Ember Cluster Hardware Overview

  • 262 Dual Socket-Six Core Nodes (3144 total cores)
  • 2.8 GHz Intel Xeon (Westmere X5660) processors
  • 24 Gbytes memory per node (2 Gbytes per processor core)
  • Mellanox QDR Infiniband interconnect
  • Gigabit Ethernet interconnect for management

NFS Home Directory

Your home directory, an NFS mounted file system is one choice for I/O. Speedwise this space carries the worst statistical performance. This space is visible to all nodes on the clusters through an auto-mounting system.

Parallel file system (/scratch/ibrix/chpc_gen)

The parallel file system can be reached by the path:/scratch/ibrix/chpc_gen. This file system has 60 TB of disk capacity. It is attached to the Infiniband network to obtain a larger potential network bandwidth. It is served by 2 load sharing redundant servers. Its space is seen (read and write access) on all of Ember's interactive and compute nodes. However it is still a shared resource and may therefore perform slower when it is subjected to significant user load. Users should test their applications performance to see if they experience any unexpected performance issues within this space.

NFS Scratch (/scratch/serial)

Ember has access to another NFS filesystem: /scratch/serial. However, we strongly advise users to opt for  /scratch/ibrix/chpc_gen.

NFS Scratch (/scratch/general)

Ember users have also partial access to the NFS file system /scratch/general. Ember's interactive nodes have read and write access to this space. However, Ember's compute nodes have no access to this file system.

Local Disk (/scratch/local)

The local scratch space is a storage space unique to each individual node. The local scratch space is cleaned aggressively and is not supported by the CHPC. It can be accessed on each node through /scratch/local. This space will be one of the fastest, but certainly not the largest (430 GB). Users must remove all their files from /scratch/local at the end of their calculation.

It is a good idea to make flows from one storage system to another when you are running jobs. For example, a job which isn't too large and doesn't need much time on the node should be run within the /scratch/general space. At the start of the batch job the data files should be copied from the home directory to the /scratch/general space. At the end of the run the output can be copied back from the /scratch/general space to the user's home directory.

It is important to keep in mind that ALL users must remove excess files on their own. Preferably this can be done within the user's batch job when he/she has finished the computation. Leaving files in any scratch space creates an impediment to other users who are trying to run their own jobs. Simply delete all extra files from any space other than your home directory when it is not being used immediately.

For all policies regard CHPC file storage see 3.1 File Storage Policies.

Ember Cluster Usage

CHPC resources are available to qualified faculty, students (under faculty supervision), and researchers from any Utah institution of higher education. Users can request accounts for CHPC computer systems by filling out an account request form. This can be found by following the link below or by coming into Room 405, INSCC Building. (Phone 581-5253)

Users requiring priority on their jobs may apply for an allocation of Service Units (SUs) per quarter need to send a brief proposal, using the the allocation form available either:

  • Web version: Allocation form.
  • Hardcopy: from our main office, 405 INSCC, 581-5253.

Ember Cluster Access

The Ember cluster can be accessed via ssh (secure shell) at the following address:

  • ember.chpc.utah.edu

All CHPC machines mount the same user home directories. This means that the user files on Ember will be exactly the same as the ones on other CHPC clusters. The advantage is obvious: users do not need to copy files between machines. However, users must be aware that they run the correct executables. E.g. running an MPI-based application on the Mellanox Infiniband hardware (Ember) requires a different executable than the one running on the QLogic InfiniBand hardware (Updraft).

Another complication associated with the use of a single home directory across all systems is the issue of the shell initialization scripts (which are executed before each login). Environmental variables (especially paths to applications) vary from cluster to cluster.
The CHPC provides a login script which detects the machine to which one is being logged in. This login script enables users to switch on/off (machine-specific) initializations for packages which have been installed on the cluster (e.g. Gaussian, Matlab, TotalView,..), to set cluster-dependent MPI-defaults, ....

At the present time, the CHPC supports two types of shells: tcsh and bash. Tcsh shell users need to select the .tcshrc login script. Users whose shell is bash need the .bashrc file to log in.

Either click here to download the default .tcshrc login script (CHPC systems)
or click here to download the default .bashrc login script (CHPC systems)

The first part of the .tcshrc/.bashrc script determines the machine type on which one is logged in based on a few parameters: the machine's operating system, its IP address or its UUFSCELL variable (defined at the system level). Upon each login the address list with the CHPC Linux machines is retrieved from the CHPC webserver. In case of a succesful retrieval the address list is stored in the file linuxips.csh (tcsh) or linuxips.sh (bash). If the CHPC webserver is non-responsive, the .tcshrc/.bashrc script uses the address list from previous sessions, or, in the absence of the latter, issues a warning.
The .tcshrc/.bashrc script looks up the machine's IP address and performs a host specific initialization.

Below you will find a slice from a .tcshrc login script (example!) which performs the initialization of the user's environment on the Ember cluster.

In the example above (and also in general) all words following the # sign are comments. Specific package initializations can be turned off by inserting a comment sign (#) at the start of a line. Please do not comment out those lines from the login script that don't start with the source command.

The initialization on all the other CHPC clusters works in a similar fashion. Although the bash syntax differs from the tcsh syntax, the bash script performs the same operations as the code in the tcsh script.

After the numerous host-specific initializations, the last section of the login script performs a global initialization (i.e. identical for all machines). In this section one can set various commands such as aliases, prompt format,...

If the user is also mounting the CHPC home directory on his/her own desktop (as most people do), then we recommend to set the variable MYMACHINE to the IP address of the user's own machine. The address of one's own machine can be found by issuing the command:

For example:

If one uses the tcsh shell, then the MYMACHINE variable (for the example above) in the .tcshrc file is set to:

If the bash shell is used the MYMACHINE variable (for the example above) in the .bashrc file becomes:

Find the line containing $MYIP == $MYMACHINE in the login script, and add as many customizations as you please in the lines following the $MYIP == $MYMACHINE statement.

Using the Batch System

The batch implementation on all CHPC systems includes PBS, a resource manager (Torque), and a scheduler (Moab).

Any process which requires more than 15 minutes run time needs to be submitted through the batch system.

Runs in the batch system generally pass though the following steps:

  1. The creation of a batch script
  2. The job's submission to the batch system
  3. Checking the job's status

The creation of a batch script on the Ember cluster

A shell script is a bundle of shell commands which are fed one after another to a shell (bash, tcsh,..). As soon as the first command has succesfully finished, the second command is executed. This process continues until either an error occurs or the complete array of individual shell commands has been executed. A batch script is a shell script which defines the tasks a particular job has to execute on a cluster.

Below this paragraph a batch script example for running in PBS on the Ember cluster is shown. The lines at top of the file all begin with #PBS which are interpreted by the shell as comments, but give options to PBS and Moab.

Example PBS Script for Ember:

Note that in the example above we don't specify the queue. For more info on qos=long, please see below.

The #PBS -S statement (1st line) indicates the shell that will be used to interpret the script.

The second line (#PBS -l) specifies the resources that you need to run your job. Your requests have to be consistent with the existing CHPC policies and based on what is available. The policy on Ember favours parallellism. A slight boost will be given to jobs which require huge amount of processors.

In the example above 96 cores or 8 nodes with each 12 processors (cores) per node (ppn stands for processors per node) are requested for 24 hours (walltime). At the present time all Ember nodes possess uniformly 12 cores per node. We recommend that you always specify the number of nodes and the number of cores per node.

Concerning wall time, we have a hard limit of maximum 48 hours for "general" ember jobs which do not run in "freecycle" mode. Jobs which will run in "freecycle mode" also have a hard limit of 48 hours. Besides, "freecycle" jobs are preemptable i.e. they may be subjected to termination. We suggest to start with some small runs to gauge how long the production runs will take. As a rule of thumb, consider a wall time which is 10-15% larger than the actual run time. If you specify a shorter wall time you face the risk of your job being killed before finishing. If you set a wall time which is too large, you may face a longer waiting time in the queue.

The third and the fourth line specify to whom and when the server should send an email. From line three of the example above (#PBS -m abe) we learn that the server will send an email when the job aborts (a), when the job begins (b) and when the job ends (e). In line four (#PBS -M) you certainly have to change the email address into your own email address.

The PBS command (#PBS -N) followed by the name of the job will allow you to tag a name to the job id. This property will allow you to improve the visibility of your job in the queue.

In the case of any unscheduled downtime (such as power outages, instances where the cooling fails and the systems are taken down quickly) any jobs actively running in the batch queue will be requeued and restarted from the beginning of the job. Note that this will not happen for scheduled downtimes as the queues are drained before the system is taken down.

If this requeueing/restarting from the beginning behavior is NOT acceptable (e.g. if you want to check it first before restarting), you can add the following option to your PBS script:

#PBS -r n

Please have a look options beneath for the available PBS flags.
Finally, in this example we suggest of using the parallel scratch space /scratch/ibrix/chpc_gen, which is visible to all nodes. However, it is a shared resource which is not being backed up. We advise users to copy important data to their home directory at the end of a job.

Note that we are running the executable from the working_directory, but reading the input files from and writing the output files into the /scratch/ibrix/chpc_gen.

Job Submission on Ember

In order to submit a job on Ember one has to login first into an interactive CHPC node. The easiest way to submit jobs on Ember is to be logged in on the Ember cluster itself (vide supra). If this is indeed the case use the qsub command in PBS or the runjob command in Moab followed by the name of the script.

For example, to submit a script named pbsjob, just type:

Jobs on Ember can also be submitted from the interactive nodes on other CHPC clusters. One now has to use the absolute name for the qsub or runjob command. Just type:

PBS sets and expects a number of arguments in a PBS script. For more information on PBS commands and their arguments, please type:

where PBS_COMMAND stands for a PBS command such as e.g. qsub. Click on this link to see a few additional PBS commands.

Checking the status of your job

To check the status of your job (we assume you are logged into the Ember cluster), use the "showq" command in Moab.

  • showq

If you are logged into the interactive node of a different CHPC cluster and you want to check your jobs on Ember, please use the absolute path to the showq command:

  • /uufs/ember.arches/sys/bin/showq

Vide infra for additional Moab commands

Moab on Ember

The Moab scheduler uses information from your script to schedule your job. On Ember, user groups and priorities are controlled through the QOS (Quality of Service) tag. Different QOSes have different maximum wall time. The general QOS, which is the default for most users, has a 48 hour wall clock limit. There exists a special QOS qos=long for jobs which take longer than 48 hours. Users whose jobs need to run more than 48 hours first have to submit a special request to the CHPC. Only after the request is fulfilled can the user run longer jobs.

Let's assume that one was granted the permission to run longer jobs (i.e. jobs which take more than 48 hours wall time e.g. 72 hours) on Ember. We also assume that one still wants to use the same number of nodes as in the previous example (vide supra), then one has to replace the line in his previous script:

#PBS -l nodes=8:ppn=12,walltime=24:00:00

by:

#PBS -l nodes=8:ppn=12,walltime=72:00:00,qos=long

For more details on the Moab batch policies, please visit the CHPC Batch Policies web page.

Additional Info:

The CHPC ember nodes are available to the whole user community.
As soon as one possesses a CHPC account one can run on the CHPC ember nodes.
You may ask yourself "How long will it take before my job starts to run?".
The answer is dependent on several factors: how long one wants to run, the
number of processors requested, the existence of a allocation time,..
In essence, as long as the user hasn't consumed his/her CHPC allocation on ember
he/she will experience a boost in his/her priority on the CHPC nodes on ember.
When a user runs out of allocation he/she can still run
jobs but there will be a significant drop in the job's priority ("freecycle" mode).

The use of the owner nodes is restricted to the group members or to whom access rights have been granted.
CHPC users are also allowed to access the nodes owned by Dr. Philip Smith.
However, there are a few points one should keep in mind:

  • Users who don't belong to Dr. Smith's group, can run on Dr. Smith's nodes for a maximum of 24h. If a user who doesn't belong to Dr. Smith's group wants to run on the nodes belonging to Dr. Smith's group he/she has to add the following line in his/her script: #PBS -A smithp-guest
  • However, the jobs submitted by users who aren't members of Dr. Smith's group can be preempted during run time. The preemption will target those jobs which have the least run time. If you decide to run on Dr. Smith's nodes and you are member of another group, then we strongly advise you to store intermediate results (for the case your job get preempted).
  • The current policy only allows jobs to run either on CHPC nodes or on owner nodes. (exclusive or)
  • Running on Dr. Smith's jobs does not affect your CHPC allocation.

Ember Cluster Local Configuration

PBS Batch Script Options

  • -a date_time.  Declares the time after which the job is eligible for execution. The date_time element is in the form: [[[[CC]YY]MM]DD]hhmm[.S].
  • -e path.  Defines the path to be used for the standard error stream of the batch job. The path is of the form: [hostname:]path_name.
  • -h.  Specifies that a user hold will be applied to the job at submission time.
  • -I.  Declares that the job is to be run "interactively". The job will be queued and scheduled as PBS batch job, but when executed the standard input, output, and error streams of the job will be connected through qsub to the terminal session in which qsub is running.
  • -j join.  Declares if the standard error stream of the job will be merged with the standard ouput stream. The joinargument is one of the following:
    • oe-  Directs the two streams as standard output.
    • eo-  Directs the two streams as standard error.
    • n-  Any two streams will be separate_(Default)_.
  • -l resource_list.  Defines the resources that are required by the job and establishes a limit on the amount of resources that can be consumed. Users will want to specify the walltime resource, and if they wish to run a parallel job, the ncpus resource.
  • -m mail_options.  Conditions under which the server will send a mail message about the job. The options are:
    • n: No mail ever sent
    • a (default): When the job aborts
    • b: When the job begins
    • e: When the job ends
  • -M user_list.  Declares the list of e-mail addresses to whom mail is sent. If unspecified it defaults to userid@host from where the job was submitted. You will most likely want to set this option.
  • -N name.  Declares a name for the job.
  • -o path.  Defines the path to be used for the standard output.[hostname:]path_name.
  • -q destination.  The destination is the queue.
  • -S path_list.  Declares the shell that interprets the job script. If not specified it will use the user's login shell.
  • -v variable_list.  Expands the list of environment variables which are exported to the job. The variable list is a comma-separated list of strings of the form variable or variable=value.
  • -V.  Declares that all environment variables in the qsub command's environment are to be exported to the batch job.

PBS User Commands

For any of the commands listed below you may do a "man command" for syntax and detailed information.

Frequently used PBS user commands:

  • qsub Submits a job to the PBS queuing system. Please see qsub Options below.
  • qdel Deletes a PBS job from the queue.
  • qstat Shows status of PBS batch jobs.

Less Frequently-Used PBS User Commands:

  • qalter. Modifies the attributes of a job.
  • qhold. Requests that the PBS server place a hold on a job.
  • qmove. Removes a job from the queue in which it resides and places the job in another queue.
  • qmsg. Sends a message to a PBS batch job. To send a message to a job is to write a message string into one or more of the job's output files.
  • qorder. Exchanges the order of two PBS batch jobs within a queue.
  • qrerun. Reruns a PBS batch job.
  • qrls. Releases a hold on a PBS batch job.
  • qselect. Lists the job identifier of those jobs which meet certain selection criteria.
  • qsig. Requests that a signal be sent to the session leader of a batch job.

Moab Scheduler User Commands

  • showq - displays jobs which are running, active, idling and non-queued.
  • showbf - shows backfill.
  • showstart - shows startime.
  • checkjob - displays status of a job.
  • showres - shows active reservations.

Each command accepts -h flag that displays help.

Moab commands are located in "/uufs/ember.arches/sys/bin". Please see theMoab Scheduler documentation for more information.

Available Compilers

C/C++

The Ember cluster offers several compilers. The GNU Compiler Suite includes the ANSI C, C++, Fortran 77 and Fortran 90 compilers. Its installed version on the RedHat EL 5 OS is 4.1.2.

In addition to the GNU compilers, we offer two commercial compiler suites: the Intel and the Portland Group Compiler (PGI) suites.

The Intel compilers generally provide superior performance. They include C, C++ and Fortran 77/90/95.

The Portland Group Compiler Suite is another good compiler distribution which has compilers for the aforementioned languages.

GNU compilers

The GNU compilers are located in the default area, i.e. compilers in /usr/bin, libraries in /usr/lib or /usr/lib64, header files in /usr/include,.... The C compiler is called gcc; the C++ compiler g+. In the next two lines we give a very basic example of how to generate an executable (exe) from a C source file (source.c) and a C+ source file (source.cc):

Intel compilers

The whole suite of the Intel compilers, debugger, ... are located in the directory:

Before users can invoke any of these Intel compilers/debugger they have to source the file compilervars.sh (bash) or compilervars.csh (tcsh) to set the necessary environmental variables:

The C compiler is invoked by the icc command; the C++ compiler by the icpc command. If you would like to see the list of available flags, please use the man pages by typing man icc (C), man icpc (C++).

To find the compiler's version, use the flag -v, i.e. type icc -v (C), icpc -v (C++).

We generally recommend the flag -fast for superior performance, however, some of the optimizations using this flag may lose precision for floating-point divides. Consult the icc man page for more details.

For more information on the C and C++ compilers/debugger, visit the following sites:

PGI compilers

The latest version of the Portland Group compilers are located in the directory:

In order to use the compiler, users have to source shell the script pgi.sh (bash)/pgi.csh (tcsh) that sets paths and some other environmental variables:

The C compiler is invoked as pgcc; the C++ compiler as pgCC.

To find the compiler's version, use the flag -V, i.e. pgcc -V. For a list of all the available flags, use the man pages (e.g. man pgcc).

In order to obtain good performance we generally recommend the flag -fastsse.

For more information on the PGI C and C++ compilers and PGI Tools (Debugger and Profiler), please have a look at:

The Portland Group advises on its website the compiler options for a few well-known codes (ATLAS, FFTW, GotoBLAS,..)

Fortran

The Ember cluster provides several Fortran compilers. The GNU Compiler Suite includes the Fortran 77 and Fortran 90 compilers. The installed version on the RedHat EL 5 OS is 4.1.2.

In addition to the GNU Fortran compilers, we offer two commercial compiler suites which contain Fortran compilers: the Intel and the Portland Group Compiler (PGI) suites.

The Intel Fortran compiler generally provides superior performance.

The Portland Group Compiler Suite is another good compiler distribution.

GNU compilers

The GNU compiler gfortran is located in the default area, i.e. the compiler binary in /usr/bin, libraries in /usr/lib or /usr/lib64. The gfortran compiler supports all the options supported by the gcc compiler. In the next line we give a very basic example of how to compile an F90 source file (source.f90).

Intel compilers

The whole suite of the Intel compilers, debugger, ... are located in the directory:

Before users can invoke any of these Intel compilers/debugger they have to source the file compilervars.sh (bash) or compilervars.csh (tcsh):

The Fortran compiler is invoked by the ifort command. For a list of the available flags, please use the man pages by typing man ifort.

To find the compiler's version, use the flag -v, i.e. ifort -v.

We generally recommend the flag -fast for superior performance, however, some of the optimizations using this flag may lose precision for floating-point divides. Consult the ifort man page for more details.

For more information on the Fortran compiler and debugger, visit the following sites:

PGI compilers

The latest version of Portland Group compilers are located in the directory:

In order to use the compiler, users have to source shell script pgi.sh (bash)/pgi.csh (tcsh) that defines paths and some other environment variables:

The Fortran compilers are invoked as pgf77 (F77), pgf90 (F90), and pgf95 (F95).

To find the compiler version, use flag -V, e.g. pgf90 -V. For list of available flags, use the man pages (e.g. man pgf90).

In order to obtain good performance we generally recommend the flag -fastsse.

For more information on the Fortran compiler(s), PGI Fortran Reference and other PGI tools, please have a look at:

The Portland Group advises on its website the compiler options for a few well-known codes (ATLAS, FFTW, GotoBLAS,..)

Parallel application development

MPI - Message Passing Interface

As Ember is a distributed memory parallel system, message passing is the way to communicate between the processes in the parallel program. The Message Passing Interface (MPI) is the prevalent communication system and the preferred mode of parallel programming on Ember.

OpenMPI on Ember

We recommend the use of the OpenMPI library which is an open source MPI-2 implementation. This MPI version is installed under the directory:

Intel:

Gnu:

The subdirectories bin, lib, include contain executables (compilation scripts,..), the static and dynamic libraries, and the header files, respectively.

In order to compile the MPI code with OpenMPI or to run your code parallel with OpenMPI, you first need to source the file openmpi.sh (bash) or openmpi.csh (tcsh) to set the necessary environmental variables.

Intel:

Gnu:

Your source code can be compiled using OpenMPI on Ember in the following way (dependent on the language to be used):

To run your parallel job on Ember using OpenMPI, use the default mpirun command, and, also, specify the number of processors ($PROCS) and the host file:

For more details on these and other options to the mpirun command, see its man page or run mpirun --help.

Ember nodes have 12 physical cores divided into 2 sockets (6 cores per socket), with the NUMA (non-uniform memory access) architecture. Therefore it is important to correctly place processes onto the cores and make sure they don't move around during the runtime. OpenMPI's mpirun command has options for process distribution and process pinning (fixing process to a given physical processor). It is highly recommended to use the pinning. For a program that spans all 12 cores in each node, use by core pinning and by core process distribution, as:

In case of OpenMP/MPI mixed codes, we have found that it is optimal to run one MPI process per socket, that is, two MPI processes per node, with each process having 6 OpenMP threads. In this case, we want to distribute the MPI processes by socket, and, bind them to socket, as:

Notice that in this command line we specify OMP_NUM_THREADS as an environmental variable passed through the mpirun command to all the MPI processes.

For optimal OpenMP performance, it is also recommended to pin the OpenMP threads to a processor core. This is achieved with Intel compiler environment variable KMP_AFFINITY, as additional mpirun parameter

.

For more details on the OpenMP thread affinity (which also explains the KMP_AFFINITY parameters shown above), see http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/lin/optaps/common/optaps_openmp_thread_affinity.htm

MVAPICH2 on Ember

MVAPICH2 is open-source MPI software which exploits the novel features in InfiniBand. We have built MVAPICH2 on Ember with the Intel compilers. This MPI version is installed under the directory:

Intel:

Gnu:

The subdirectories bin, lib, include contain executables (compilation scripts,..), the static and dynamic libraries, and the header files, respectively.

In order to compile the MPI code with MVAPICH2 or to run your code parallel with MVAPICH2, you first need to source the file mvapich2.sh (bash) or mvapich2.csh (tcsh) to set the necessary environmental variables.

Intel:

Gnu:

Your source code can be compiled using MVAPICH2 on Ember in the following way (dependent on the language to be used):

To run your parallel job on Ember with MVAPICH2, use the default mpirun command, and, also, specify the number of processors ($PROCS) and the host file in the following way:

For more details on these and other options to the mpirun command, see its man page or run mpirun --help.

OpenMP - shared memory programming

All Ember nodes are dual six-core processors, which implies that shared memory programming can be used on these nodes to save some time on the MPI message overhead. OpenMP is emerging to be the major industry standard for shared memory programming, and is supported by various compilers with command line flag -fopenmp (GNU compilers), -openmp (Intel compilers) or -mp (PGI compilers).

More information on OpenMP can be found in:

Debugging and Profiling

The Data Display Debugger (ddd) is a graphical interface which supports multiple debuggers, including the standard GNU debugger, gdb. With ddd one can attach to running processes, set conditional break points, manipulate the data of executing processes, view source code, assembly code, registers, threads, and signal states. Man pages for both ddd and gdb are available. For more information visit the following URL:

http://www.gnu.org/software/ddd/

In addition the Portland Group includes a debugger, i.e. pgdbg in their PGI Compiler suite.

Totalview, a de-facto industry standard debugger supports both serial and parallel debugging. For details on how to use Totalview, have a look at the CHPC Totalview presentation.

For serial profiling, there is GNU gprof and the Portland Group pgprof. Intel provides the VTune Amplifier XE to test the code performance for users developing serial and multithreaded applications. Intel also provides the Intel Inspector which has been developed to find and fix memory errors and thread errors in serial or parallel code.

Libraries

Linear algebra subroutines

There are several different BLAS library versions on Ember, however, we recommend using the Intel Math Kernel Library (MKL) since it is optimized for the Intel processors.

Advanced users may want to experiment with two other BLAS libraries, GOTO Blas and Atlas.

Intel Math Kernel Library (MKL)

MKL is best suited for Intel based processors. Thus, on Ember, we recommend using MKL for BLAS and LAPACK. We also recommend using Intel Fortran and C/C++ for best performance.

MKL contains highly optimized math routines. It includes full optimized BLAS, LAPACK, sparse solvers, vector math library, random number generators and and fast Fourier transform routines (including FFTW wrappers). For more information, consult the Intel Math Kernel Library Documentation. Its latest version is located at:

Compilation instructions:

Intel Fortran (using dynamic linking )
Intel C/C++ (using dynamic linking)

If you use the C++ compiler, please replace icc by icpc and change the suffix .c into .cc in the previous statement.

It is also possible to incorporate OpenMP-threaded MKL into an OpenMP or mixed MPI/OpenMP code. To do so, parallelize your code with OpenMP but leave the MKL calls unthreaded, and instead link the threaded MKL library as e.g.:

Then run as you usually would with given OMP_NUM_THREADS and MKL calls will run over that many threads as well.

For more MKL linking option see this document:http://software.intel.com/sites/products/documentation/hpc/mkl/lin/MKL_UG_linking_your_application/Linking_Examples.htm

ATLAS

Automatically Tuned Linear Algebra Software (ATLAS) is an open source library aimed at providing portable performance solution. The ATLAS suite in sensu stricto include all BLAS subroutines and a subset of the LAPACK routines, which are being tuned to the computer platform at the compilation time. ATLAS does not seem to have the optimizations for the latest CPU families so we recommend to use MKL instead on Ember. The most recent ATLAS version is located at:

Fortran

GNU Fortran:

To include the LAPACK routines in ATLAS:

C/C++

GNU C/C++:
  • No labels