sbatch
(instead of qsub
) command to submit jobs. Refer to the Slurm migration page to understand how to use Slurm. Memory limit
It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs.
Summary
Node type | Partition | default memory per core | max usable memory per node (96 usable cores/node) |
---|---|---|---|
regular compute | cpu | 6144 MB (6 GB) | 589,824 MB (576 GB) |
regular compute | cache | 4956 MB (4.84 GB) | 475,776 MB (464.6 GB) |
gpu | gpu | 9216 MB (9 GB) | 884,736 MB (864 GB) |
huge memory | hugemem | 21104 MB (20.6 GB) | 2,025,984 MB (1978.5 GB) |
It is recommended to let the default memory apply unless more control over memory is needed.
Note that if an entire node is requested, then the job is automatically granted the entire node's memory. On the other hand, if a partial node is requested, then memory is granted based on the default memory per core.
See a more detailed explanation below.
Regular Dense Compute Node
Default memory limits
A job can request resources and allow the default memory to apply. If a job requires 180 GB for example:
#SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=30
This requests 30 cores, and each core will automatically be allocated 6 GB of memory (30 core * 6GB memory = 180 GB memory
).
Explicit memory requests
If needed, an explicit memory request can be added:
#SBATCH --ntasks-per-node=1 #SBATCH --mem=180G
Job charging is determined either by number of cores or amount of memory.
See Job and storage charging for details.
Multi-node job request
On Cardinal, it is allowed to request partical nodes for a multi-node job ( nodes>1
) . This is an example of a job requesting 2 nodes with 1 core per node:
#SBATCH --ntasks-per-node=1\ #SBATCH --cpus-per-task=1 #SBATCH --nodes=2
Here, job charging is determined by number of cores requested in the job script.
Whole-node request
To request the whole node regardless of the number of nodes, you should either request the max number of usable cores per node (96) or add --exclusive
as
#SBATCH --ntasks-per-node=96
or
#SBATCH --exclusive
Here, job is allocated and charged for the whole-node.
Huge Memory Node
To request a partial or whole huge memory node, specify the memory request between 864GB and 1978GB, i.e., 886GB <= mem < 1978GB
. Note: you can only use interger for request
GPU Jobs
There are 4 GPUs per GPU node on Cardinal. Jobs may request only parts of gpu node.
Requests two gpus for one task:
#SBATCH --time=5:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=20 #SBATCH --gpus-per-task=2
Requests two gpus, one for each task:
#SBATCH --time=5:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=10 #SBATCH --gpus-per-task=1
Of course, jobs can request all the gpus of a dense gpu node as well. Request an entire dense gpu node:
#SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=96 #SBATCH --gpus-per-node=4
See this GPU computing page for more information.
Partition time and job size limits
Here is the walltime and node limits per job for different queues/partitions available on Cardinal:
NAME |
MAX TIME LIMIT |
MIN JOB SIZE |
MAX JOB SIZE |
NOTES |
---|---|---|---|---|
cpu |
7-00:00:00 |
1 core |
12 nodes |
This partition can not request gpus. 322 nodes in total. HBM configured in flat mode. See this HBM page for more info. |
cache | 7-00:00:00 | 1 core | 4 nodes |
This partition can not request gpus. 4 nodes in total. HBM configured in cache mode. See this HBM page for more info. Must add the flag |
gpu |
7-00:00:00 |
1 core |
12 nodes |
|
debug | 1:00:00 | 1 core | 2 nodes | For small interactive and test jobs (both CPU and GPU) |
hugemem | 7-00:00:00 | 1 core | 1 node |
Usually, you do not need to specify the partition for a job and the scheduler will assign the right partition based on the requested resources. To specify a partition for a job, either add the flag --partition=<partition-name>
to the sbatch command at submission time or add this line to the job script:#SBATCH --paritition=<partition-name>
Job/Core Limits
Max Running Job Limit | Max Core/Processor Limit | Max Node Limit | |||||
---|---|---|---|---|---|---|---|
For all types | GPU jobs | Regular debug jobs | GPU debug jobs | For all types | GPU | hugemem | |
Individual User | 384 | n/a | 4 | 4 | 5184 | 32 | 12 |
Project/Group | 576 | n/a | n/a | n/a | 5184 | 32 | 12 |
An individual user can have up to the max concurrently running jobs and/or up to the max processors/cores in use. However, among all the users in a particular group/project, they can have up to the max concurrently running jobs and/or up to the max processors/cores in use.