Batch-Related Command Summary

This section summarizes two groups of batch-related commands: commands that are run on the login nodes to manage your jobs and commands that are run only inside a batch script. Only the most common options are described here.

Many of these commands are discussed in more detail elsewhere in this document. All have online manual pages (example: man sbatch ) unless otherwise noted.

In describing the usage of the commands we use square brackets [like this] to indicate optional arguments. The brackets are not part of the command.

Important note: The batch systems on Pitzer, Ruby, and Owens are entirely separate. Be sure to submit your jobs on a login node for the system you want them to run on. All monitoring while the job is queued or running must be done on the same system also. Your job output, of course, will be visible from both systems.

Commands for managing your jobs

These commands are typically run from a login node to manage your batch jobs. The batch systems on Pitzer and Owens are completely separate, so the commands must be run on the system where the job is to be run.

sbatch

The sbatch command is used to submit a job to the batch system.

Usage Desctiption Example
sbatch [ options ] script Submit a script for a batch job. The options list is rarely used but can augment or override the directives in the header lines of the script.   sbatch sim.job
sbatch -t array_request [ options ] jobid Submit an array of jobs sbatch -t 1-100 sim.job
sinteractive [ options ] Submit an interactive batch job sinteractive -n 4


squeue

The squeue command is used to display the status of batch jobs.

Usage Desctiption Example
squeue Display all jobs currently in the batch system. squeue
squeue -j jobid Display information about job jobid. The -j flag uses an alternate format. squeue -j 123456
squeue -j jobid -l Display long status information about job jobid. squeue -j 123456 -l
squeue -u username [-l] Display information about all the jobs belonging to user username. squeue -u usr1234

scancel

The scancel command may be used to delete a queued or running job.

Usage Description Example
scancel jobid Delete job jobid.

scancel 123456

scancel jobid Delete all jobs in job array jobid. scancel 123456
qdel jobid[jobnumber] Delete jobnumber within job array jobid. scancel 123456_14

slurm output file

There is an output file which stores the stdout and stderr for a running job which can be viewed to check the running job output. It is by default located in the dir where the job was submitted and has the format slurm-<jobid>.out

The output file can also be renamed and saved in any valid dir using the option --output=<filename pattern>

Cannot currently pass environment variables into slurm job script and can only specify this when using sbatch command at job submission.
e.g.
sbatch --output=$HOME/test_slurm.out <job-script> works
#SBATCH --output=$HOME/test_slurm.out does NOT work in job script
See slurm migration issues for details.
Do not delete/modify the output file that is generated while your job running. This could cause adverse affects on your running job.

scontrol

The scontrol command may be used to modify the attributes of a queued (not running) job. Not all attributes can be altered.

Usage Description Example
scontrol update jobid=<jobid> [ option ] Alter one or more attributes a queued job. The options you can modify are a subset of the directives that can be used when submitting a job.

scontrol update jobid=123456 --ntasks-per-node=4

This command can also be used inside a job like so:
scontrol show job=$SLURM_JOB_ID

scontrol hold/release

The qhold command allows you to place a hold on a queued job. The job will be prevented from running until you release the hold with the qrls command.

Usage Description Example
scontrol hold jobid Place a user hold on job jobid scontrol hold 123456
scontrol release jobid Release a user hold previously placed on job jobid scontrol release 123456

scontrol show

The scontrol show command can be used to provide details about a job that is running.

scontrol show job=$SLURM_JOB_ID

Usage Description Example
scontrol show job=<jobid> Check the details of a running job. scontrol show job=123456

estimating start time

The squeue command can try to estimate when a queued job will start running. It is extremely unreliable, often making large errors in either direction.

Usage Description Example
squeue -j jobid \
--Format=username,jobid,account,startTime
Display estimate of start time.
squeue -j 123456 \ 
--Format=username,jobid,account,startTime

 

Commands used only inside a batch job

These commands can only be used inside a batch job.

srun

Generally used to start an mpi process during a job. Can use most of the options available also from the sbatch command.

Usage Example
srun <prog> srun --ntasks-per-node=4 a.out

sbcast/sgather

Tool for copying files to/from all nodes allocated in a job.

Usage
sbcast <src_file> <nodelocaldir>/<dest_file>
sgather <src_file> <shareddir>/<dest_file>
 sgather -r <src_dir> <sharedir>/dest_dir>

Note: sbcast does not have a recursive cast option, meaning you can't use sbcast -r to scatter multiple files in a directory. Instead, you may use a loop command similar to this:

cd ${the directory that has the files}

for FILE in * 
do
    sbcast -p $FILE $TMPDIR/some_directory/$FILE
done

mpiexec

Use the mpiexec command to run a parallel program or to run multiple processes simultaneously within a job. It is a replacement program for the script mpirun , which is part of the mpich package.
The OSC version of mpiexec is customized to work with our batch environment. There are other mpiexec programs in existence, but it is imperative that you use the one provided with our system.

Usage Description Example
mpiexec progname [ args ] Run the executable program progname in parallel, with as many processes as there are processors (cores) assigned to the job (nodes*ppn).

mpiexec myprog

mpiexec yourprog abc.dat 123

mpiexec - ppn 1 progname [ args ] Run only one process per node. mpiexec -ppn 1 myprog
mpiexec - ppn num progname [ args ] Run the specified number of processes on each node. mpiexec -ppn 3 myprog
mpiexec -tv [ options ] progname [ args ] Run the program with the TotalView parallel debugger.

mpiexec -tv myprog

mpiexec -n num progname [ args ]

mpiexec -np num progname [ args ] Run only the specified number of processes. ( -n and -np are equivalent.) Does not spread processes out evenly across nodes. mpiexec -n 3 myprog
The options above apply to the MVAPICH2 and IntelMPI installations at OSC. See the OpenMPI software page for mpiexec usage with OpenMPI.

pbsdcp

The pbsdcp command is a distributed copy command for the Slurm environment. It copies files to or from each node of the cluster assigned to your job. This is needed when copying files to directories which are not shared between nodes, such as $TMPDIR.

Options are -r for recursive and -p to preserve modification times and modes.

Usage Description Example
pbsdcp [-s] [ options ] srcfiles  target “Scatter”. Copy one or more files from shared storage to the target directory on each node (local storage). The -s flag is optional.

pbsdcp -s infile1 infile2 $TMPDIR

pbsdcp model.* $TMPDIR

pbsdcp -g [ options ] srcfiles  target “Gather”. Copy the source files from each node to the shared target directory. Wildcards must be enclosed in quotes. pbsdcp -g '$TMPDIR/outfile*' $PBS_O_WORKDIR

Note: In gather mode, if files on different nodes have the same name, they will overwrite each other. In the -g example above, the file names may have the form outfile001 , outfile002 , etc., with each node producing a different set of files.