Batch Execution Environment

Shell and initialization

Your batch script executes in a shell on a compute node. The environment is identical to what get when you connect to a login node except that you have access to all the resources requested by your job. By default, the script is executed using the same shell that you get when you log in (bash, tcsh, etc.). The appropriate “dot-files” (.login, .profile, .cshrc) will be executed, the same as when you log in. (For information on overriding the default shell, see the Job Scripts section.)

Execution begins in your home directory, regardless of what directory your script resides in or where you submitted the job from. You can use the cd command to change to a different directory. The environment variable $PBS_O_WORKDIR makes it easy to return to the directory from which you submitted the job:

cd $PBS_O_WORKDIR

Modules

There are dozens of software packages available on OSC’s systems, many of them with multiple versions. You control what software is available in your environment by loading the module for the software you need. Each module sets certain environment variables required by the software.

If you are running software that was installed by OSC, you should check the software documentation page to find out what modules to load.

The module systems on Oakley and Glenn are a little different, but the concepts are the same. Examples are given here for both systems.

Several modules are automatically loaded for you when you login or start a batch script. These default modules differ somewhat between Oakley and Glenn, but they include

  • modules required by the batch system
  • a compiler suite (Intel compilers on Oakley, PGI compilers on Glenn)
  • an MPI package compatible with the default compiler (for parallel computing)

The module command has a number of subcommands. The most useful of these are documented here. For more details, type “module help”.

Certain modules are incompatible with each other and should never be loaded at the same time. Examples are different versions of the same software or multiple installations of a library built with different compilers. Oakley does a pretty good job of checking compatibility; unfortunately Glenn does not.

Note to those who build or install their own software: Be sure to load the same modules when you run your software that you had loaded when you built it, including the compiler module. This is particularly important on Oakley, but it is good practice on Glenn also.

Module system on Oakley

Each module on Oakley has both a name and a software version number. When more than one version is available for the same name, one of them is designated as the default. For example, the following modules are available for the Intel compilers on Oakley:

  • intel/12.1.0 (default)
  • intel/12.1.4.319

If you specify just the name, it refers to the default version or the currently loaded version, depending on the context. If you want a different version, you must give the entire string. Examples are given below.

On Oakley you can have only one compiler module loaded at a time, either intel, pgi, or gnu. The intel module is loaded initially; to change to pgi or gnu, do a module swap (see example below).

Some software libraries have multiple installations built for use with different compilers. The module system will load the one compatible with the compiler you have loaded. If you swap compilers, all the compiler-dependent modules will also be swapped.

Special note to gnu compiler users: While the gnu compilers are always in your path, you should load the gnu compiler module to ensure you are linking to the correct library versions.

To list the modules you have loaded:

module list

To see all modules that are compatible with your currently loaded modules:

module avail

To see compatible modules whose names start with fftw:

module avail fftw

To see all possible modules:

module spider

To see all possible modules whose names start with fftw:

module spider fftw

To load the fftw3 module that is compatible with your current compiler:

module load fftw3

To unload the fftw3 module:

module unload fftw3

To load the default version of the abaqus module (not compiler-dependent):

module load abaqus

To load a different version of the abaqus module:

module load abaqus/6.8-4

To unload whatever abaqus module you have loaded:

module unload abaqus

To swap the intel compilers for the pgi compilers (unloads intel, loads pgi):

module swap intel pgi

To swap the default version of the intel compilers for a different version:

module swap intel intel/12.1.4.319

To display help information for the mkl module:

module help mkl

To display the commands run by the mkl module:

module show mkl

To use a locally installed module on Oakley, first import the module directory:

module use [/path/to/modulefiles]

And then load the module:

module load localmodule

Module system on Glenn

The modules on Glenn have a software version number built into the name. Some modules have a shorter alternate name. For example, the following modules are available for the Intel compilers on Glenn:

  • intel-compilers-10.0.023
  • intel-compilers-10.0 (same as intel-compilers-10.0.023)
  • intel-compilers-11.1.056
  • intel-compilers-11.1 (same as intel-compilers-11.1.056)
  • intel-compilers-9.1

On Glenn you will typically have modules loaded for all the compiler suites (PGI, Intel, gnu).

Some software libraries have multiple installations built for use with different compilers. You must make certain that you have the correct modules loaded for compatibility with the compiler you are using.

To list the modules you have loaded:

module list

To see all modules that are available:

module avail

Same as above but restricted to modules whose names start with fftw:

module avail fftw

To load the fftw3 module compatible with the PGI compilers:

module load fftw3

To load the fftw3 module compatible with the gnu compilers:

module load fftw3-gnu

To load the default version of abaqus (not compiler-dependent):

module load abaqus

To swap the default mpi module, which works with the PGI compilers, for the mvapich2 module that works with the intel compilers:

module unload mpi
module load mvapich2-1.6-intel

Note: There is a “module swap” on Glenn, but it doesn’t always work correctly. It’s safer to unload one module and load the other one.

To display help information for the acml module:

module help acml

To display the commands run by the acml module:

module show acml

PBS environment variables

Your batch execution environment has all the environment variables that your login environment has plus several that are set by the batch system. This section gives examples for using some of them. For more information see “man qsub”.

Directories

Several directories may be useful in your job.

The absolute path of the directory your job was submitted from is $PBS_O_WORKDIR. Recall that your job always starts in your home directory. To get back to your submission directory:

cd $PBS_O_WORKDIR

Each job has a temporary directory, $TMPDIR, on the local disk of each node assigned to it. Access to this directory is much faster than access to your home or project directory. The files in this directory are not visible from all the nodes in a parallel job; each node has its own directory. The batch system creates this directory when your job starts and deletes it when your job ends. To copy file input.dat to $TMPDIR on all your job’s first node:

cp input.dat $TMPDIR

To copy file input.dat to $TMPDIR on all your job’s nodes:

pbsdcp input.dat $TMPDIR

Each job has a temporary directory, $PFSDIR, on the parallel file system. This is a single directory shared by all the nodes a job is running on. Access is faster than access to your home or project directory but not as fast as $TMPDIR. The batch system creates this directory when your job starts and deletes it when your job ends. To copy the file output.dat from this directory to the directory you submitted your job from:

cp $PFSDIR/output.dat $PBS_O_WORKDIR

The $HOME environment variable refers to your home directory. It is not set by the batch system but is useful in some job scripts. It is better to use $HOME than to hardcode the path to your home directory. To access a file in your home directory:

cat $HOME/myfile

Informational variables

Several environment variables provide information about your job that may be useful.

A list of the nodes and cores assigned to your job is in the file $PBS_NODEFILE. To display this file:

cat $PBS_NODEFILE

For GPU jobs on Oakley, a list of the GPUs assigned to your job is in the file $PBS_GPUFILE. To display this file:

cat $PBS_GPUFILE

If you use a job array, each job in the array gets its identifier within the array in the variable $PBS_ARRAYID. To pass a file name parameterized by the array ID into your application:

./a.out input${PBS_ARRAYID}.dat

To display the numeric job Identifier assigned by the batch system:

echo $PBS_JOBID

To display the job name:

echo $PBS_JOBNAME

Use fast storage

If your job does a lot of file-based input and output, your choice of file system can make a huge difference in the performance of the job.

Shared file systems

Your home and project directories are located on shared file systems, providing long-term storage that is accessible from all OSC systems. Shared file systems are relatively slow. They cannot handle heavy loads such as those generated by large parallel jobs or many simultaneous serial jobs. You should minimize the I/O your jobs do on the shared file systems. It is usually best to copy your input data to fast temporary storage, run your program there, and copy your results back to your home or project directory.

Batch-managed directories

Batch-managed directories are temporary directories that exist only for the duration of a job. They exist on two types of storage: disks local to the compute nodes and a parallel file system.

A big advantage of batch-managed directories is that the batch system deletes them when a job ends, preventing clutter on the disk.

A disadvantage of batch-managed directories is that you can’t access them after your job ends. Be sure to include commands in your script to copy any files you need to long-term storage. To avoid losing your files if your job ends abnormally, for example by hitting its walltime limit, include a trap command in your script. The following example creates a subdirectory in $PBS_O_WORKDIR and copies everything from $TMPDIR into it in case of abnormal termination.

trap "cd $PBS_O_WORKDIR;mkdir $PBS_JOBID;cp -R $TMPDIR/* $PBS_JOBID" TERM

If a node your job is running on crashes, the trap command may not be executed. It may be possible to recover your batch-managed directories in this case. Contact OSC Help for assistance.

Local disk space

The fastest storage is on a disk local to the node your job is running on, accessed through the environment variable $TMPDIR. The main drawback to local storage is that each node of a parallel job has its own directory and cannot access the files on other nodes. See also “Considerations for Parallel Jobs”.

Local disk space should be used only through the batch-managed directory created for your job. Please do not use /tmp directly because your files won’t be cleaned up properly.

Parallel file system

The parallel file system is faster than the shared file systems for large-scale I/O and can handle a much higher load. You should use it when your files must be accessible by all the nodes in your job and also when your files are too large for the local disk.

The parallel file system is efficient for reading and writing data in large blocks. It should not be used for I/O involving many small accesses.

The parallel file system is typically used through the batch-managed directory created for your job. The path for this directory is in the environment variable $PFSDIR.

You may also create a directory for yourself in /fs/lustre and use it the way you would use any other directory. You should name the directory with either your user name or your project ID. This directory will not be backed up; files are subject to deletion after some number of months (see policies for details).

Note: You should not copy your executable files to $PFSDIR. They should be run from your home or project directories or from $TMPDIR.