We will have rolling reboots of all HPC clusters (Ascend, Cardinal, Owens, and Pitzer cluster), including login and compute nodes, starti

Batch Limit Rules

Memory Limit:

It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. On Oakley, it equates to 4GB/core and 48GB/node.

If your job requests less than a full node ( ppn< 12), it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4GB/core).  For example, without any memory request ( mem=XX ), a job that requests  nodes=1:ppn=1  will be assigned one core and should use no more than 4GB of RAM, a job that requests  nodes=1:ppn=3  will be assigned 3 cores and should use no more than 12GB of RAM, and a job that requests  nodes=1:ppn=12  will be assigned the whole node (12 cores) with 48GB of RAM.  However, a job that requests  nodes=1:ppn=1,mem=12GB  will be assigned one core but have access to 12GB of RAM, and charged for 3 cores worth of Resource Units (RU).  See Charging for memory use for more details.

A multi-node job ( nodes>1 ) will be assigned the entire nodes with 48GB/node and charged for the entire nodes regardless of ppn request. For example, a job that requests nodes=10:ppn=1 will be charged for 10 whole nodes (12 cores/node*10 nodes, which is 120 cores worth of RU). A job that requests large-memory node ( nodes=XX:ppn=12:bigmem, XX can be 1-8) will be allocated the entire large-memory node with 192GB of RAM and charged for the whole node (12 cores worth of RU). A job that requests huge-memory node ( nodes=1:ppn=32 ) will be allocated the entire huge-memory node with 1TB of RAM and charged for the whole node (32 cores worth of RU).

To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.

GPU Limit:

On Oakley, GPU jobs may request any number of cores and either 1 or 2 GPUs ( nodes=XX:ppn=XX: gpus=1 or gpus=2 ). The memory limit depends on the ppn request and follows the rules in Memory Limit.

Walltime Limit

Here are the queues available on Oakley:

NAME

MAX WALLTIME

MAX JOB SIZE

NOTES

Serial

168 hours

1 node

 

Longserial

336 hours

1 node

Restricted access

Parallel

96 hours

125 nodes

 

Longparallel

250 hours

230 nodes

Restricted access

Hugemem

48 hours

1 node

32 core with 1 TB RAM

nodes=1:ppn=32

Debug

1 hour

12 nodes

 

Job Limit

An individual user can have up to 256 concurrently running jobs and/or up to 2040 processors/cores in use. All the users in a particular group/project can among them have up to 384 concurrently running jobs and/or up to 2040 processors/cores in use. Jobs submitted in excess of these limits are queued but blocked by the scheduler until other jobs exit and free up resources.

A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately. Jobs submitted in excess of this limit will be rejected.