Memory Limit:
It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. On Oakley, it equates to 4GB/core and 48GB/node.
If your job requests less than a full node ( ppn< 12
), it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4GB/core). For example, without any memory request ( mem=XX
), a job that requests nodes=1:ppn=1
will be assigned one core and should use no more than 4GB of RAM, a job that requests nodes=1:ppn=3
will be assigned 3 cores and should use no more than 12GB of RAM, and a job that requests nodes=1:ppn=12
will be assigned the whole node (12 cores) with 48GB of RAM. However, a job that requests nodes=1:ppn=1,mem=12GB
will be assigned one core but have access to 12GB of RAM, and charged for 3 cores worth of Resource Units (RU). See Charging for memory use for more details.
A multi-node job ( nodes>1
) will be assigned the entire nodes with 48GB/node and charged for the entire nodes regardless of ppn request. For example, a job that requests nodes=10:ppn=1
will be charged for 10 whole nodes (12 cores/node*10 nodes, which is 120 cores worth of RU). A job that requests large-memory node ( nodes=XX:ppn=12:bigmem
, XX can be 1-8) will be allocated the entire large-memory node with 192GB of RAM and charged for the whole node (12 cores worth of RU). A job that requests huge-memory node ( nodes=1:ppn=32
) will be allocated the entire huge-memory node with 1TB of RAM and charged for the whole node (32 cores worth of RU).
To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.
GPU Limit:
On Oakley, GPU jobs may request any number of cores and either 1 or 2 GPUs ( nodes=XX:ppn=XX: gpus=1 or gpus=2
). The memory limit depends on the ppn request and follows the rules in Memory Limit.
Walltime Limit
Here are the queues available on Oakley:
NAME |
MAX WALLTIME |
MAX JOB SIZE |
NOTES |
---|---|---|---|
Serial |
168 hours |
1 node |
|
Longserial |
336 hours |
1 node |
Restricted access |
Parallel |
96 hours |
125 nodes |
|
Longparallel |
250 hours |
230 nodes |
Restricted access |
Hugemem |
48 hours |
1 node |
32 core with 1 TB RAM
|
Debug |
1 hour |
12 nodes |
|
Job Limit
An individual user can have up to 256 concurrently running jobs and/or up to 2040 processors/cores in use. All the users in a particular group/project can among them have up to 384 concurrently running jobs and/or up to 2040 processors/cores in use. Jobs submitted in excess of these limits are queued but blocked by the scheduler until other jobs exit and free up resources.
A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately. Jobs submitted in excess of this limit will be rejected.