Problem Description
Our current GPFS file system is a distributed process with significant interactions between the clients. As the compute nodes being GPFS flle system clients, a certain amount of memory of each node needs to be reserved for these interactions. As a result, the maximum physical memory of each node allowed to be used by users' jobs is reduced, in order to keep the healthy performance of the file system. In addition, using swap memory is not allowed.
The table below summarizes the maximum physical memory allowed for each type of nodes on our systems:
Owens Cluster
NODE TYPE | PHYSICAL MEMORY per node | MAXIMUM MEMORY ALLOWED per node |
---|---|---|
Regular node | 128GB | 118GB |
Huge memory node | 1536GB (1.5TB) |
1493GB |
Pitzer Cluster
Node type | physical memory per node | Maximum memory allowed per Node |
---|---|---|
Regular node | 192GB | 178GB |
Dual GPU node | 384GB | 363GB |
Quad GPU node | 768 GB | 744 GB |
Large memory node | 768 GB | 744 GB |
Huge memory node | 3072GB (3TB) | 2989GB |
Solutions When You Need Regular Nodes
If you do not request memory explicitly in your job (no --mem
)
Your job can be submitted and scheduled as before, and resources will be allocated according to your requests for cores/nodes ( --nodes=XX --ntasks-per-node=YY
). If you request a partial node, the memory allocated to your job is proportional to the number of cores requested; if you request the whole node, the memory allocated to your job is based on the information summarized in the above tables.
If you have a multi-node job ( nodes>1
), your job will be assigned the entire nodes with maximum memory allowed per node and charged for the entire nodes regardless of --ntasks-per-node
request.
If you do request memory explicitly in your job (with --mem
)
If you request memory explicitly in your script, please re-visit your script according to the following pages:
Pitzer: https://www.osc.edu/resources/technical_support/supercomputers/pitzer/batch_limit_rules
Owens: https://www.osc.edu/resources/technical_support/supercomputers/owens/batch_limit_rules