Memory limit
It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs.
Summary
Partition | # of gpus per node | Usable cores per node | default memory per core | max usable memory per node |
---|---|---|---|---|
nextgen | 2 | 120 | 4,027 MB | 471.91 GB |
quad | 4 | 88 | 10,724 MB | 921.59 GB |
batch | 4 | 88 | 10,724 MB | 921.59 GB |
It is recommended to let the default memory apply unless more control over memory is needed.
Note that if an entire node is requested, then the job is automatically granted the entire node's main memory. On the other hand, if a partial node is requested, then memory is granted based on the default memory per core.
See a more detailed explanation below.
Default memory limits
A job can request resources and allow the default memory to apply. If a job requires 300 GB for example:
#SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=30
This requests 30 cores, and each core will automatically be allocated 10.4 GB of memory for a quad GPU node (30 core * 10 GB memory = 300 GB memory
).
Explicit memory requests
If needed, an explicit memory request can be added:
#SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=300G
See Job and storage charging for details.
CPU only jobs
We reserve 1 core per 1 GPU. The CPU-only job can be scheduled but can only request up to 118 cores per dual GPU node and up to 84 cores per quad GPU node. You can also request multiple nodes for one CPU-only job.
GPU Jobs
Jobs may request only parts of gpu node. These jobs may request up to the total cores on the node (88 cores) for quad GPU nodes.
Requests two gpus for one task:
#SBATCH --time=5:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=20 #SBATCH --gpus-per-task=2
Requests two gpus, one for each task:
#SBATCH --time=5:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=10 #SBATCH --gpus-per-task=1
Of course, jobs can request all the gpus of a dense gpu node as well. These jobs have access to all cores as well.
Request an entire dense gpu node:
#SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=88 #SBATCH --gpus-per-node=4
Partition time and job size limits
Here is the walltime and node limits per job for different queues/partitions available on Ascend:
artition | Max walltime limit | Min job size | Max job size | Note |
---|---|---|---|---|
nextgen | 7-00:00:00 (168 hours) | 1 core | 4 nodes |
Can request multiple partial nodes |
quad | 7-00:00:00 (168 hours) | 1 core | 2 nodes | Can request multiple partial nodes |
debug-nextgen | 1 hour | 1 core | 2 nodes | |
debug-quad | 1 hour | 1 core | 2 nodes |
|
Usually, you do not need to specify the partition for a job and the scheduler will assign the right partition based on the requested resources. To specify a partition for a job, either add the flag --partition=<partition-name>
to the sbatch command at submission time or add this line to the job script:#SBATCH --paritition=<partition-name>
Job/Core Limits
Max # of cores in use | Max # of GPUs in use | Max # of running jobs | Max # of jobs to submit | |
---|---|---|---|---|
Per user | 5,632 | 96 | 256 | 1000 |
Per project | 5,632 | 96 | 512 | n/a |
An individual user can have up to the max concurrently running jobs and/or up to the max processors/cores in use. However, among all the users in a particular group/project, they can have up to the max concurrently running jobs and/or up to the max processors/cores in use.