Job scripts are submitted to the batch system using the sbatch
command. Be sure to submit your job on the system you want your job to run on, or use the --cluster=<system>
option to specify one.
Standard batch job
Most jobs on our system are submitted as scripts with no command-line options. If your script is in a file named myscript
:
sbatch myscript
In response to this command you’ll see a line with your job ID:
Submitted batch job 123456
You’ll use this job ID (numeric part only) in monitoring your job. You can find it again using the squeue -u <username>
When you submit a job, the script is copied by the batch system. Any changes you make subsequently to the script file will not affect the job. Your input files and executables, on the other hand, are not picked up until the job starts running.
Interactive batch
The batch system supports an interactive batch mode. This mode is useful for debugging parallel programs or running a GUI program that’s too large for the login node. The resource limits (memory, CPU) for an interactive batch job are the same as the standard batch limits.
Interactive batch jobs are generally invoked without a script file.
Custom sinteractive command
OSC has developed a script to make starting an interactive session simpler.
The sinteractive command takes simple options and starts an interactive batch session automatically. However, its behavior can be counterintuitive with respect to numbers of tasks and CPUs. In addition, jobs launched with sinteractive can show environmental differences compared to jobs launched via other means. As an alternative, try, e.g.:
salloc -A <proj-code> --time=500
Simple serial
The example below demonstrates using sinteractive to start a serial interactive job:
sinteractive -A <proj-code>
The default if no resource options are specified is for a single core job to be submitted.
Simple parallel (single node)
To request a simple parallel job of 4 cores on a single node:
sinteractive -A <proj-code> -c 4
To setup for OpenMP executables then enter this command:
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Parallel (multiple nodes)
To request 2 whole nodes on Pitzer with a total of 96 cores between both nodes:
sinteractive -A <proj-code> -N 2 -n 96
But note that the slurm variables SLURM_CPUS_PER_TASK, SLURM_NTASKS, and SLURM_TASKS_PER_NODE are all 1, so subsequent srun commands to launch parallel executables must explicitly specify the task and cpu numbers desired. Unless one really needs to run in the debug queues it is in general simpler to start with an appropriate salloc command.
sinteractive --help
to view all the options available and their default values.Using salloc and srun
An example of using salloc and srun:
salloc --account=pas1234 --x11 --nodes=2 --ntasks-per-node=28 --time=1:00:00
The salloc
command requests the resources. Job is interactive. The --x11
flag enables X11 forwarding, which is necessary with a GUI. You will need to have a X11 server running on your computer to use X11 forwarding, see the getting connected page. The remaining flags in this example are resource requests with the same meaning as the corresponding header lines in a batch file.
After you enter this line, you’ll see something like the following:
salloc: Pending job allocation 123456 salloc: job 123456 queued and waiting for resources
Your job will be queued just like any job. When the job runs, you’ll see the following line:
salloc: job 123456 has been allocated resources salloc: Granted job allocation 123456 salloc: Waiting for resource configuration salloc: Nodes o0001 are ready for job
At this point, you have an interactive login shell on one of the compute nodes, which you can treat like any other login shell.
It is important to remember that OSC systems are optimized for batch processing, not interactive computing. If the system load is high, your job may wait for hours in the queue, making interactive batch impractical. Requesting a walltime limit of one hour or less is recommended because your job can run on nodes reserved for debugging.
Job arrays
If you submit many similar jobs at the same time, you should consider using a job array. With a single sbatch
command, you can submit multiple jobs that will use the same script. Each job has a unique identifier, $SLURM_ARRAY_TASK_ID
, which can be used to parameterize its behavior.
Individual jobs in a job array are scheduled independently, but some job management tasks can be performed on the entire array.
To submit an array of jobs numbered from 1 to 100, all using the script sim.job
:
sbatch --array=1-100 sim.job
The script would use the environment variable $SLURM_ARRAY_TASK_ID
, possibly as an input argument to an application or as part of a file name.
Job dependencies
It is possible to set conditions on when a job can start. The most common of these is a dependency relationship between jobs.
For example, to ensure that the job being submitted (with script sim.job
) does not start until after job 123456 has finished:
sbatch --dependency=afterany:123456 sim.job
Job variables
It is possible to provide a list of environment variables that are exported to the job.
For example, to pass the variable and its value to the job with the script sim.job
, use the command:
sbatch --export
=var=value sim.job
Many other options are available, some quite complicated; for more information, see the sbatch
online manual by using the command:
man sbatch