×

Owens Information Transition

Owens cluster will be decommissioned on February 3, 2025. Some pages may still reference Owens after Owens is decommissioned , and we are in the process of gradually updating the content. Thank you for your patience during this transition

How to Submit, Monitor and Manage Jobs

Submit Jobs

Use Torque/Moab Command Slurm Equivalent
Submit batch job qsub <jobscript> sbatch <jobscript>
Submit interactive job qsub -I [options]

sinteractive [options]

salloc [options]

Notice: If a node fails, then the running job will be automatically resubmitted to the queue and will only be charged for the resubmission time and not the failed time.
One can use  --mail-type=ALL option in their script to receive notifications about their jobs. Please see the slurm sbatch man page for more information.
Another option is to disable the resubmission using --no-requeue so that the job does get submitted on node failure.
A final note is that if the job does not get requeued after a failure, then there will be a charged incurred for the time that the job ran before it failed.

Interactive jobs

Submitting interactive jobs is a bit different in Slurm. When the job is ready, one is logged into the login node they submitted the job from. From there, one can then login to one of the reserved nodes.

You can use the custom tool sinteractive as:

[xwang@pitzer-login04 ~]$ sinteractive
salloc: Pending job allocation 14269
salloc: job 14269 queued and waiting for resources
salloc: job 14269 has been allocated resources
salloc: Granted job allocation 14269
salloc: Waiting for resource configuration
salloc: Nodes p0591 are ready for job
...
...
[xwang@p0593 ~] $
# can now start executing commands interactively

Or, you can use salloc as:

[user@pitzer-login04 ~] $ salloc -t 00:05:00 --ntasks-per-node=3
salloc: Pending job allocation 14209
salloc: job 14209 queued and waiting for resources
salloc: job 14209 has been allocated resources
salloc: Granted job allocation 14209
salloc: Waiting for resource configuration
salloc: Nodes p0593 are ready for job

# normal login display
$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
14210 serial-48     bash     usee  R       0:06      1 p0593
[user@pitzer-login04 ~]$ srun --jobid=14210 --pty /bin/bash
# normal login display
[user@p0593 ~] $
# can now start executing commands interactively

Manage Jobs

Use Torque/Moab Command Slurm Equivalent
Delete a job* qdel <jobid>  scancel <jobid>
Hold a job qhold <jobid> scontrol hold <jobid>
Release a job qrls <jobid>  scontrol release <jobid>

* User is eligible to delete his own jobs. PI/project admin is eligible to delete jobs submitted to the project he is an admin on. 

Monitor Jobs

Use Torque/Moab Command Slurm Equivalent
Job list summary qstat or showq squeue
Detailed job information qstat -f <jobid> or checkjob <jobid> sstat -a <jobid> or scontrol show job <jobid>
Job information by a user qstat -u <user> squeue -u <user>

View job script

(system admin only)

js <jobid> jobscript <jobid>
Show expected start time showstart <job ID>

squeue --start --jobs=<jobid>

Supercomputer: