Cardinal

OSC's Cardinal cluster will open to all clients on Monday, November 4, 2024.

2024_1104cardinalclustergraphicv2.png

Detailed system specifications:

  • 378 Dell Nodes, 39,312 total cores, 128 GPUs 

  • Dense Compute: 326 Dell PowerEdge C6620 two-socket servers, each with: 

    • 2 Intel Xeon CPU Max 9470 (Sapphire Rapids, 52 cores [48 usable], 2.0 GHz) processors 

    • 128 GB HBM2e and 512 GB DDR5 memory 

    • 1.6 TB NVMe local storage 

    • NDR200 Infiniband 

  • GPU Compute: 32 Dell PowerEdge XE9640 two-socket servers, each with: 

    • 2 Intel Xeon Platinum 8470 (Sapphire Rapids, 52 cores [48 usable], 2.0 GHz) processors 

    • 1 TB DDR5 memory 

    • 4 NVIDIA H100 (Hopper) GPUs each with 94 GB HBM2e memory and NVIDIA NVLink 

    • 12.8 TB NVMe local storage 

    • Four NDR400 Infiniband HCAs supporting GPUDirect 

  • Analytics: 16 Dell PowerEdge R660 two-socket servers, each with: 

    • 2 Intel Xeon CPU Max 9470 (Sapphire Rapids, 52 cores [48 usable], 2.0 GHz) processors 

    • 128 GB HBM2e and 2 TB DDR5 memory 

    • 12.8 TB NVMe local storage 

    • NDR200 Infiniband 

  • Login nodes: 4 Dell PowerEdge R660 two-socket servers, each with: 

    • 2 Intel Xeon CPU Max 9470 (Sapphire Rapids, 52 cores [48 usable], 2.0 GHz) processors 

    • 128 GB HBM and 1 TB DDR5 memory 

    • 3.2 TB NVMe local storage 

    • NDR200 Infiniband  

    • IP address: TBD 

  • ~10.5 PF Theoretical system peak performance  

    • ~8 PetaFLOPs (GPU) 

    • ~2.5 PetaFLOPS (CPU) 

  • 9 Physical racks, plus Two Coolant Distribution Units (CDUs) providing direct-to-the-chip liquid cooling for all nodes 

How to Connect

  • SSH Method

To login to Cardinal cluster at OSC, ssh to the following hostname:

cardinal.osc.edu 

You can either use an ssh client application or execute ssh on the command line in a terminal window as follows:

ssh <username>@cardinal.osc.edu

You may see a warning message including SSH key fingerprint. Verify that the fingerprint in the message matches one of the SSH key fingerprints listed here, then type yes.

From there, you are connected to the Cardinal login node and have access to the compilers and other software development tools. You can run programs interactively or through batch requests. We use control groups on login nodes to keep the login nodes stable. Please use batch jobs for any compute-intensive or memory-intensive work. See the following sections for details. 

  • OnDemand Method

You can also login to Cardinal with our OnDemand tool. The first step is to log into ondemand.osc.edu. Once logged in you can access Cardinal by clicking on "Clusters", and then selecting ">_Cardinal Shell Access".

Instructions on how to use OnDemand can be found at the OnDemand documentation page.

File Systems

Owens accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the old clusters. Full details of the storage environment are available in our storage environment guide.

Software Environment

The Cardinal cluster runs on Red Hat Enterprise Linux (RHEL) 9, which provides access to modern tools and libraries but may also require adjustments to your workflows. Please refer to the Cardinal Software Environment page for key software changes and available software.

Cardinal uses the same module system as the other clusters. You can keep up to on the software packages that have been made available on Cardinal by viewing the Software by System page and selecting the Cardinal system.

Programming Environment

The Cardinal cluster supports programming in C, C++, and Fortran. The available compiler suites include Intel, oneAPI, and GCC. Additionally, users have access to high-bandwidth memory (HBM), which is expected to enhance the performance of memory-bound applications. Please refer to the Cardinal Programming Environment page for details on compiler commands, parallel and GPU computing, and instructions on how to effectively utilize HBM. 

Batch Specifics  

The Cardinal cluster supports Slurm with the PBS compatibility layer being disabled. Refer to the documentation for our batch environment to understand how to use the batch system on OSC hardware. Refer to the Slurm migration page to understand how to use Slurm and the batch limit page about scheduling policy during the Program.

Supercomputer: 
Service: 

Technical Specifications

The following are technical specifications for Cardinal.  

Number of Nodes

378 nodes

Number of CPU Sockets

756 (2 sockets/node for all nodes)

Number of CPU Cores

39,312

Cores Per Node

104 cores/node for all nodes (96 usable)

Local Disk Space Per Node
  • 1.6 TB for compute nodes
  • 12.8 TB for GPU and Large mem nodes
  • 3.2 TB for login nodes
Compute, Large Mem & Login Node CPU Specifications
Intel Xeon CPU Max 9470 HBM2e (Sapphire Rapids)
  • 2.0 GHz
  • 52 cores per processor (48 usable)
GPU Node CPU Specifications
Intel Xeon Platinum 8470 (Sapphire Rapids)
  • 2.0 GHz
  • 52 cores per processor
Server Specifications
  • 326 Dell PowerEdge C6620
  • 32 Dell PowerEdge XE9640 (GPU nodes)
  • 20 Dell PowerEdge R660 (largemem & login nodes)
Accelerator Specifications

NVIDIA H100 (Hopper) GPUs each with 96 GB HBM2e memory and NVIDIA NVLINK

Number of Accelerator Nodes

32 quad GPU nodes (4 GPUs per node)

Total Memory

~281 TB (44 TB HBM, 237 TB DDR5)

Memory Per Node
  • 128 GB HBM / 512 GB DDR5 (compute nodes)
  • 1 TB (GPU nodes)
  • 128 GB HBM / 2 TB DDR5 (large mem nodes)
  • 128 GB HBM / 1 TB DDR5 (login nodes)
Memory Per Core
  • 1.2 GB HBM / 4.9 GB DDR5 (compute nodes)
  • 9.8 GB (GPU nodes)
  • 1.2 GB HBM / 19.7 GB DDR5 (large mem nodes)
  • 1.2 GB HBM / 9.8 GB DDR5 (login nodes)
Interconnect
  • NDR200 Infiniband (200 Gbps) (compute, large mem, login nodes)
  • 4x NDR400 Infiniband (400 Gbps x 4) with GPUDirect, allowing non-blocking communication between up to 10 nodes (GPU nodes)
Service: 

Cardinal Programming Environment

Compilers

The Cardinal cluster supports C, C++, and Fortran programming languages. The available compiler suites include Intel, oneAPI, and GCC. By default, the Intel development toolchain is loaded. The table below lists the compiler commands and recommended options for compiling serial programs. For more details and best practices, please refer to our compilation guide.

The Sapphire Rapids processors that make up Cardinal support the Advanced Vector Extensions (AVX512) instruction set, but you must set the correct compiler flags to take advantage of it. AVX512 has the potential to speed up your code by a factor of 8 or more, depending on the compiler and options you would otherwise use. However, bare in mind that clock speeds decrease as the level of the instruction set increases. So, if your code does not benefit from vectorization it may be beneficial to use a lower instruction set.

In our experience, the Intel compiler usually does the best job of optimizing numerical codes and we recommend that you give it a try if you’ve been using another compiler.

With the Intel or oneAPI compilers, use -xHost and -O2 or higher. With the GNU compilers, use -march=native and -O3

This advice assumes that you are building and running your code on Cardinal. The executables will not be portable.  Of course, any highly optimized builds, such as those employing the options above, should be thoroughly validated for correctness.

LANGUAGE INTEL GNU ONEAPI
C icc -O2 -xHost hello.c gcc -O3 -march=native hello.c icx -O2 -xHost hello.c
Fortran ifort -O2 -xHost hello.F gfortran -O3 -march=native hello.F ifx -O2 -xHost hello.F
C++ icpc -O2 -xHost hello.cpp g++ -O3 -march=native hello.cpp icpx -O2 -xHost hello.cpp

Parallel Programming

MPI

OSC systems use the MVAPICH implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect. MPI is a standard library for performing parallel processing using a distributed-memory model. For more information on building your MPI codes, please visit the MPI Library documentation.

MPI programs are started with the srun command. For example,

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntassk-per-node=8

srun [ options ] mpi_prog
Note: The program to be run must either be in your path or have its full path specified.

The above job script will allocate 2 CPU nodes with 8 CPU cores each. The srun command will typically spawn one MPI process per task requested in a Slurm batch job. Use the --ntasks-per-node=n option to change that behavior. For example,

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8

# Run 8 processes per node
srun ./mpi_prog

# Run 4 processes per node
srun --ntasks=8 --ntasks-per-node=4 ./mpi_prog
Note: The information above applies to the MVAPICH, Intel MPI and OpenMPI installations at OSC. 
Caution: mpiexec or mpirun is still supported with Intel MPI and OpenMPI, but it may not be fully compatible with our Slurm environment. We recommend using srun in all cases.

OpenMP

The Intel, oneAPI and GNU compilers understand the OpenMP set of directives, which support multithreaded programming. For more information on building OpenMP codes on OSC systems, please visit the OpenMP documentation.

An OpenMP program by default will use a number of threads equal to the number of CPUs requested in a Slurm batch job. To use a different number of threads, set the environment variable OMP_NUM_THREADS. For example,

#!/bin/bash
#SBATCH --ntasks-per-node=8

# Run 8 threads
./omp_prog

# Run 4 threads
export OMP_NUM_THREADS=4
./omp_prog

To run a OpenMP job on an exclusive node:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --exclusive

./omp_prog

Hybrid (MPI + OpenMP)

An example of running a job for hybrid code:

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --exclusive

# Each Cardinal node is equipped with 96 CPU cores
# Run 8 MPI processes on each node and 12 OpenMP threads spawned from a MPI process
export OMP_NUM_THREADS=12
srun --ntasks=16 --ntasks-per-node=8 --cpus-per-task=12 ./hybrid_prog

Tuning Parallel Program Performance: Process/Thread Placement

To get the maximum performance, it is important to make sure that processes/threads are located as close as possible to their data, and as close as possible to each other if they need to work on the same piece of data, with given the arrangement of node, sockets, and cores, with different access to RAM and caches. 

While cache and memory contention between threads/processes are an issue, it is best to use scatter distribution for code. 

Processes and threads are placed differently depending on the computing resources you requste and the compiler and MPI implementation used to compile your code. For the former, see the above examples to learn how to run a job on exclusive nodes. For the latter, this section summarizes the default behavior and how to modify placement.

OpenMP only

For all three compilers (Intel, GNU, PGI), purely threaded codes do not bind to particular CPU cores by default. In other words, it is possible that multiple threads are bound to the same CPU core

The following table describes how to modify the default placements for pure threaded code:

DISTRIBUTION Compact Scatter/Cyclic
DESCRIPTION Place threads as closely as possible on sockets Distribute threads as evenly as possible across sockets
INTEL/ONEAPI KMP_AFFINITY=compact KMP_AFFINITY=scatter
GNU OMP_PLACES=sockets[1] OMP_PROC_BIND=true
OMP_PLACES=cores
  1. Threads in the same socket might be bound to the same CPU core.

MPI Only

For MPI-only codes, MVAPICH first binds as many processes as possible on one socket, then allocates the remaining processes on the second socket so that consecutive tasks are near each other.  Intel MPI and OpenMPI alternately bind processes on socket 1, socket 2, socket 1, socket 2 etc, as cyclic distribution.

For process distribution across nodes, all MPIs first bind as many processes as possible on one node, then allocates the remaining processes on the second node. 

The following table describe how to modify the default placements on a single node for MPI-only code with the command srun:

DISTRIBUTION
(single node)
Compact Scatter/Cyclic
DESCRIPTION Place processs as closely as possible on sockets Distribute process as evenly as possible across sockets
MVAPICH[1] Default MVP_CPU_BINDING_POLICY=scatter
INTEL MPI SLURM_DISTRIBUTION=block:block
srun -B "2:*:1" ./mpi_prog
SLURM_DISTRIBUTION=block:cyclic
srun -B "2:*:1" ./mpi_prog
OPENMPI SLURM_DISTRIBUTION=block:block
srun -B "2:*:1" ./mpi_prog
SLURM_DISTRIBUTION=block:cyclic
srun -B "2:*:1" ./mpi_prog
  1. MVP_CPU_BINDING_POLICY will not work if MVP_ENABLE_AFFINITY=0 is set.
  2.  To distribute processes evenly across nodes, please set SLURM_DISTRIBUTION=cyclic.

Hybrid (MPI + OpenMP)

For hybrid codes, each MPI process is allocated a number of cores defined by OMP_NUM_THREADS, and the threads of each process are bound to those cores. All MPI processes, along with the threads bound to them, behave similarly to what was described in the previous sections.

The following table describe how to modify the default placements on a single node for Hybrid code with the command srun:

DISTRIBUTION
(single node)
Compact Scatter/Cyclic
DESCRIPTION Place processs as closely as possible on sockets Distribute process as evenly as possible across sockets
MVAPICH[1] Default MVP_HYBRID_BINDING_POLICY=scatter
INTEL MPI[2] SLURM_DISTRIBUTION=block:block SLURM_DISTRIBUTION=block:cyclic
OPENMPI[2] SLURM_DISTRIBUTION=block:block SLURM_DISTRIBUTION=block:cyclic

Summary

The above tables list the most commonly used settings for process/thread placement. Some compilers and Intel libraries may have additional options for process and thread placement beyond those mentioned on this page. For more information on a specific compiler/library, check the more detailed documentation for that library.

Using HBM

326 dense compute nodes are available with 512 GB of DDR memory and 128 GB of High Bandwidth memory (HBM). Memory-bound application in particular are expected to benefit from the use of HBM but other codes may also show some benefits by using HBM.

All nodes in the cpu partition have the HBM configured in flat mode, meaning that HBM is visible to your application as addessable memory. By default, your code will use DDR memory only. To enable your application to use HBM memory, first load the numactl/2.0.18 module and then prepend the appropriate numactl command to your run command as shown in the table below.

Execution Model DDR HBM
Serial ./a.out numactl --preferred-many=8-15 ./a.out
MPI srun ./a.out

srun numactl --preferred-many=8-15 ./a.out

Please visit our HBM documentation for more information.

GPU Programming

132 NVIDIA H100 GPUs are available on Cardinal.  Please visit our GPU documentation.

Reference

Supercomputer: 
Fields of Science: 

Cardinal Software Environment

The Cardinal cluster is now running on Red Hat Enterprise Linux (RHEL) 9, introducing several software-related changes compared to the RHEL 7 environment used on the Owens and Pitzer clusters. These updates provide access to modern tools and libraries but may also require adjustments to your workflows. Key software changes and available software are outlined in the following sections.

Updated Compilers and Toolchains

The system GCC (GNU Compiler Collection) is now at version 11. Additionally, newer versions of GCC and other compiler suites, including the Intel Compiler Classic and Intel oneAPI, are available and can be accessed through the modules system. These new compiler versions may impact code compilation, optimization, and performance. We encourage users to test and validate their applications in this new environment to ensure compatibility and performance.

Python Upgrades

The system Python has been upgraded to version 3.9, and the system Python 2 is no longer available on Cardinal. Additionaly, newer versions of Python 3 are available through the modules system. This change may impact scripts and packages that rely on older versions of Python. We recommend users review and update their code to ensure compatibility or create custom environments as needed.

Available Software

To view the software currently installed on the Cardinal cluster, visit Browse Software and select "Cardinal" under the "System". If the software required for your research is not available, please contact  OSC Help to reqeust the software.

Revised Software Modules

Some modules have been updated, renamed, or removed to align with the standards of the package managent system. For more details, please refer to the software page of the specific software you are interested in. Notable changes include:

Package Owens/Pitzer Cardinal
Default MPI mvapich2/2.3.3 mvapich/3.0
GCC gnu gcc
Intel MKL intel, mkl  intel-oneapi-mkl
Intel VTune intel intel-oneapi-vtune
Intel TBB intel intel-oneapi-tbb
Intel MPI intelmpi intel-oneapi-mpi
NetCDF netcdf netcdf-c, netcdf-cxx4, netcdf-fortran
BLAST+ blast blast-plus
Java java openjdk
Quantum Espresso espresso quantum-espresso

Licensed Software

Several licensed software packages have been installed on Cardinal and will transition from Owens: Abaqus, ANSYS, COMSOL, Schrödinger, STAR-CCM+, Stata, and LS-DYNA. Schrödinger, COMSOL, and Stata have been fully migrated to Cardinal on November 4. The remaining licensed software, Abaqus, ANSYS, STAR-CCM+, and LS-DYNA  will be removed from Owens on December, 2024. More details will be communicated.

Known Issues

We are actively identifying and addressing issues in the new environment. Please report any problems to the support team by contacting OSC Help to ensure a smooth transition. Notable issues include:

Software Versions Issues
STAR-CCM+ All
OpenMPI All
GCC 13.2.0
MVAPICH 3.0

Additional known issues can be found on our Known Issues page. To view issues related to the Cardinal cluster, select "Cardinal" under the "Category".

Supercomputer: 
Fields of Science: 

Batch Limit Rules

The PBS compatibility layer is disabled on Cardinal so PBS batch scripts WON'T work on Cardinal, though it works on Owens and Pitzer clusters. You also need to use sbatch (instead of qsub) command to submit jobs. Refer to the Slurm migration page to understand how to use Slurm. 

Memory limit

It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs.

Summary

Node type Partition default memory per core  max usable memory per node (96 usable cores/node)
regular compute cpu 6144 MB (6 GB) 589,824 MB (576 GB)
regular compute cache 4956 MB (4.84 GB) 475,776 MB (464.6 GB)
gpu  gpu 9216 MB (9 GB) 884,736 MB (864 GB)
huge memory hugemem 21104 MB (20.6 GB) 2,025,984 MB (1978.5 GB)

It is recommended to let the default memory apply unless more control over memory is needed.

Note that if an entire node is requested, then the job is automatically granted the entire node's memory. On the other hand, if a partial node is requested, then memory is granted based on the default memory per core.

See a more detailed explanation below.

Regular Dense Compute Node

Default memory limits

A job can request resources and allow the default memory to apply. If a job requires 180 GB for example:

#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=30

This requests 30 cores, and each core will automatically be allocated 6 GB of memory (30 core * 6GB memory = 180 GB memory).

Explicit memory requests

If needed, an explicit memory request can be added:

#SBATCH --ntasks-per-node=1
#SBATCH --mem=180G

Job charging is determined either by number of cores or amount of memory.
See Job and storage charging for details.

Multi-node job request

On Cardinal, it is allowed to request partical nodes for a multi-node job ( nodes>1 ) .  This is an example of a job requesting 2 nodes with 1 core per node: 

#SBATCH --ntasks-per-node=1\
#SBATCH --cpus-per-task=1
#SBATCH --nodes=2

Here, job charging is determined by number of cores requested in the job script. 

Whole-node request

To request the whole node regardless of the number of nodes, you should either request the max number of usable cores per node (96) or add --exclusive as

#SBATCH --ntasks-per-node=96

or 

#SBATCH --exclusive

Here, job is allocated and charged for the whole-node. 

Huge Memory Node

To request a partial or whole huge memory node, specify the memory request between 864GB and 1978GB, i.e., 886GB <= mem < 1978GB. Note: you can only use interger for request

 

GPU Jobs

There are 4 GPUs per GPU node on Cardinal. Jobs may request only parts of gpu node.

 Requests two gpus for one task:

#SBATCH --time=5:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH --gpus-per-task=2

Requests two gpus, one for each task:

#SBATCH --time=5:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=10
#SBATCH --gpus-per-task=1

Of course, jobs can request all the gpus of a dense gpu node as well. Request an entire dense gpu node:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=96
#SBATCH --gpus-per-node=4

See this GPU computing page for more information. 

Partition time and job size limits

Here is the walltime and node limits per job for different queues/partitions available on Cardinal:

NAME

MAX TIME LIMIT
(dd-hh:mm:ss)

MIN JOB SIZE

MAX JOB SIZE

NOTES

cpu

7-00:00:00

1 core

12 nodes

This partition can not request gpus. 322 nodes in total.

HBM configured in flat mode. See this HBM page for more info.

cache 7-00:00:00 1 core  4 nodes

This partition can not request gpus. 4 nodes in total. 

HBM configured in cache mode. See this HBM page for more info. Must add the flag --partition=cache

gpu

7-00:00:00

 1 core
1 gpu

12 nodes 

 
debug 1:00:00 1 core 2 nodes For small interactive and test jobs (both CPU and GPU)
hugemem 7-00:00:00 1 core 1 node   

Usually, you do not need to specify the partition for a job and the scheduler will assign the right partition based on the requested resources. To specify a partition for a job, either add the flag --partition=<partition-name> to the sbatch command at submission time or add this line to the job script:
#SBATCH --paritition=<partition-name>

    Job/Core Limits

      Max Running Job Limit  Max Core/Processor Limit Max Node Limit
      For all types GPU jobs Regular debug jobs GPU debug jobs For all types GPU  hugemem
    Individual User 384 n/a 4 4 5184  32  12 
    Project/Group 576 n/a n/a n/a 5184 32  12 

    An individual user can have up to the max concurrently running jobs and/or up to the max processors/cores in use. However, among all the users in a particular group/project, they can have up to the max concurrently running jobs and/or up to the max processors/cores in use.

    A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately.
    Supercomputer: 

    Cardinal SSH key fingerprints

    These are the public key fingerprints for Cardinal:

    cardinal: ssh_host_rsa_key.pub = 73:f2:07:6c:76:b4:68:49:86:ed:ef:a3:55:90:58:1b
    cardinal: ssh_host_ed25519_key.pub = 93:76:68:f0:be:f1:4a:89:30:e2:86:27:1e:64:9c:09
    cardinal: ssh_host_ecdsa_key.pub = e0:83:14:8f:d4:c3:c5:6c:c6:b6:0a:f7:df:bc:e9:2e

    These are the SHA256 hashes:​
    cardinal: ssh_host_rsa_key.pub = SHA256:RznzsAFLAqiOIwNCZ/0ZlXqU4/t2nznsRkM1lrcqBPI 
    cardinal: ssh_host_ed25519_key.pub = SHA256:AQ/cDcms8EPV3bd9x8w2SVrl6sJMDSdITBEbNCQ5w+A
    cardinal: ssh_host_ecdsa_key.pub = SHA256:TeiEzjue7Il36e9ftfytCE1OvvaVVRwKB2/+geJyQhA

    Supercomputer: 

    Migrating jobs from other clusters

    We have prepared "Getting Started with Cardinal" course on the ScarletCanvas platform. This course offers essential guidance for migrating jobs from other clusters to the Cardinal cluster at the Ohio Supercomputer Center (OSC). It covers essential topics such as hardware, software, programming environments, job scheduling, and the HBM feature to ensure a seamless transition and efficient job execution on the Cardinal cluster.

    Hardware Specification

    Below is a summary of the hardware information:

    • 326 "dense compute" nodes (96 usable cores, 128 GB HBM2e and 512 GB DDR5 memory)
    • 32 GPU nodes (96 usable cores, 1 TB DDR5 memory, 4 NVIDIA H100 GPUs each with 94 GB HBM2e memory and NVIDIA NVLink)
    • 16 large memory nodes (96 usable cores, 128 GB HBM2e and 2 TB DDR5 memory)

    See the Cardinal page and Technical Specifications page for more information. 

    File Systems

    Cardinal accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory, project space, and scratch space as on the other clusters.

    Software Environment

    The Cardinal cluster runs on Red Hat Enterprise Linux (RHEL) 9, introducing several software-related changes compared to the RHEL 7 environment used on the Owens and Pitzer clusters. These updates provide access to modern tools and libraries but may also require adjustments to your workflows. Please refer to the Cardinal Software Environment page for key software changes and available software.

    Cardinal uses the same module system as the other clusters. 

    Use   module load <package to add a software package to your environment. Use   module list   to see what modules are currently loaded and  module avail   to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use   module spider 

    You can keep up to on the software packages that have been made available on Cardinal by viewing the Software by System page and selecting the Cardinal system.

    Programming Environment

    The Cardinal cluster supports programming in C, C++, and Fortran. The available compiler suites include Intel, oneAPI, and GCC. Additionally, users have access to high-bandwidth memory (HBM), which is expected to enhance the performance of memory-bound applications. Other codes may also benefit from HBM, depending on their workload characteristics.

    Please refer to the Cardinal Programming Environment page for details on compiler commands, parallel and GPU computing, and instructions on how to effectively utilize HBM. 

    Batch Specifics  

    The PBS compatibility layer is disabled on Cardinal so PBS batch scripts WON'T work on Cardinal, though it works on Owens and Pitzer clusters. In addition, you need to use sbatch (instead of qsub) command to submit jobs. Refer to the Slurm migration page to understand how to use Slurm and the batch limit page about scheduling policy during the Program.

    Some specifics you will need to know to create well-formed batch scripts:

    • Follow the Slurm job script page to convert the PBS batch scripts to Slurm scripts if you have not done so
    • Refer to the job management page on how to manage and monitor jobs. 
    • Jobs may request partial nodes, including both serial (node=1) and multi-node ( nodes>1) jobs.
    • Most dense compute nodes have the HBM configured in flat mode, but 4 nodes are configured in cache mode. Please refer to the HBM page on detailed discussions about flat and cache modes and the batch limit page on how to request different modes.

    Supercomputer: