We will have rolling reboots of all HPC clusters (Ascend, Cardinal, Owens, and Pitzer cluster), including login and compute nodes, starti

Batch Limit Rules

Memory Limit:

A small portion of the total physical memory on each node is reserved for distributed processes.  The actual physical memory available to user jobs is tabulated below.

Summary

Node type default and max memory per core max memory per node
regular compute 4.214 GB 117 GB
huge memory 31.104 GB 1492 GB
gpu 4.214 GB 117 GB
A job may request more than the max memory per core, but the job will be allocated more cores to satisfy the memory request instead of just more memory.
e.g. The following slurm directives will actually grant this job 3 cores, with 10 GB of memory
(since 2 cores * 4.2 GB = 8.4 GB doesn't satisfy the memory request).
#SBATCH --ntasks-per-node=2
 #SBATCH --mem=10g

It is recommended to let the default memory apply unless more control over memory is needed.
Note that if an entire node is requested, then the job is automatically granted the entire node's main memory. On the other hand, if a partial node is requested, then memory is granted based on the default memory per core.

See a more detailed explanation below.

Regular Dense Compute Node

On Owens, it equates to 4,315 MB/core or 120,820 MB/node (117.98 GB/node) for the regular dense compute node. 

If your job requests less than a full node ( ntasks-per-node < 28 ), it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4315 MB/core).  For example, without any memory request ( mem=XXMB ), a job that requests  --nodes=1 --ntasks-per-node=1 will be assigned one core and should use no more than 4315 MB of RAM, a job that requests  --nodes=1 --ntasks-per-node=3  will be assigned 3 cores and should use no more than 3*4315 MB of RAM, and a job that requests --nodes=1 --ntasks-per-node=28 will be assigned the whole node (28 cores) with 118 GB of RAM.  

Here is some information when you include memory request (mem=XX ) in your job. A job that requests  --nodes=1 --ntasks-per-node=1 --mem=12GB  will be assigned three cores and have access to 12 GB of RAM, and charged for 3 cores worth of usage (in other ways, the request --ntasks-per-node is ingored).  A job that requests  --nodes=1 --ntasks-per-node=5 --mem=12GB  will be assigned 5 cores but have access to only 12 GB of RAM, and charged for 5 cores worth of usage. 

A multi-node job ( nodes>1 ) will be assigned the entire nodes with 118 GB/node and charged for the entire nodes regardless of ppn request. For example, a job that requests  --nodes=10 --ntasks-per-node=1 will be charged for 10 whole nodes (28 cores/node*10 nodes, which is 280 cores worth of usage).  

Huge Memory Node

Beginning on Tuesday, March 10th, 2020, users are able to run jobs using less than a full huge memory node. Please read the following instructions carefully before requesting a huge memory node on Owens. 

On Owens, it equates to 31,850 MB/core or 1,528,800 MB/node (1,492.96 GB/node) for a huge memory node.

To request no more than a full huge memory node, you have two options:

  • The first is to specify the memory request between 120,832 MB (118 GB) and 1,528,800 MB (1,492.96 GB), i.e., 120832MB <= mem <=1528800MB ( 118GB <= mem < 1493GB). Note: you can only use interger for request
  • The other option is to use the combination of --ntasks-per-node and --partition, like --ntasks-per-node=4 --partition=hugemem . When no memory is specified for the huge memory node, your job is entitled to a memory allocation proportional to the number of cores requested (31,850MB/core). Note, --ntasks-per-node should be no less than 4 and no more than 48

To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.

GPU Jobs

There is only one GPU per GPU node on Owens.

For serial jobs, we allow node sharing on GPU nodes so a job may request any number of cores (up to 28)

(--nodes=1 --ntasks-per-node=XX --gpus-per-node=1)

For parallel jobs (n>1), we do not allow node sharing. 

See this GPU computing page for more information. 

Partition time and job size limits

Here are the partitions available on Owens:

Name Max time limit
(dd-hh:mm:ss)
Min job size Max job size notes

serial

7-00:00:00

1 core

1 node

 

longserial

14-00:00:0

1 core

1 node

  • Restricted access (contact OSC Help if you need access)

parallel

4-00:00:00

2 nodes

81 nodes

 

gpuserial 7-00:00:00 1 core 1 node  
gpuparallel 4-00:00:00 2 nodes 8 nodes  

hugemem

7-00:00:00

1 core

1 node

 
hugemem-parallel 4-00:00:00 2 nodes 16 nodes
  • Restricted access (contact OSC Help if you need access)
debug 1:00:00 1 core 2 nodes
  • For small interactive and test jobs
gpudebug 1:00:00 1 core 2 nodes
  • For small interactive and test GPU jobs
To specify a partition for a job, either add the flag --partition=<partition-name> to the sbatch command at submission time or add this line to the job script:
#SBATCH --paritition=<partition-name>

To access one of the restricted queues, please contact OSC Help. Generally, access will only be granted to these queues if the performance of the job cannot be improved, and job size cannot be reduced by splitting or checkpointing the job.

Job/Core Limits

  Max Running Job Limit  Max Core/Processor Limit Max node Limit
  For all types GPU jobs Regular debug jobs GPU debug jobs For all types hugemem
Individual User 384 132 4 4 3080 12
Project/Group 576 132 n/a n/a 3080 12

An individual user can have up to the max concurrently running jobs and/or up to the max processors/cores in use.

However, among all the users in a particular group/project, they can have up to the max concurrently running jobs and/or up to the max processors/cores in use.

A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately.
Supercomputer: 
Service: