Search our client documentation below, optionally filtered by one or more systems.
Search Documentation
Search Documentation
Overview
Estimating GPU memory (VRAM) usage for training or running inference with large deep learning models is critical to both 1. requesting the appropriate resources for running your computation and 2. optimizing your job once it is setup. Out-of-memory (OOM) errors can be avoided by requesting appropriate resources and by better understanding memory usage during the job using memory profiling tools described here.
This page outlines how to use the Jupyter interactive app on OnDemand.
Launching Jupyter App
Log on to https://ondemand.osc.edu/ with your OSC credentials. Choose Jupyter under the InteractiveApps option.
Rust is a general-purpose programming language with an emphasis on performance, type safety, and concurrency. It enforces memory safety without a traditional garbage collector, preventing data races and memory safety errors via the "borrow checker". The Rust module provides rustc and cargo.
Availability and Restrictions
Versions
The following versions of Rust are available on OSC clusters:
Hardware Specification
Below is a summary of the hardware information:
The Cardinal cluster is now running on Red Hat Enterprise Linux (RHEL) 9, introducing several software-related changes compared to the RHEL 7 environment used on the Owens and Pitzer clusters. These updates provide access to modern tools and libraries but may also require adjustments to your workflows. Key software changes and available software are outlined in the following sections.
Overview of the High Bandwidth Memory on Cardinal's Dense compute nodes
Compilers
The Cardinal cluster supports C, C++, and Fortran programming languages. The available compiler suites include Intel, oneAPI, and GCC. By default, the Intel development toolchain is loaded. The table below lists the compiler commands and recommended options for compiling serial programs. For more details and best practices, please refer to our compilation guide.
These are the public key fingerprints for Cardinal:
cardinal: ssh_host_rsa_key.pub = 73:f2:07:6c:76:b4:68:49:86:ed:ef:a3:55:90:58:1b
cardinal: ssh_host_ed25519_key.pub = 93:76:68:f0:be:f1:4a:89:30:e2:86:27:1e:64:9c:09
cardinal: ssh_host_ecdsa_key.pub = e0:83:14:8f:d4:c3:c5:6c:c6:b6:0a:f7:df:bc:e9:2e
PyTorch Fully Sharded Data Parallel (FSDP) is used to speed-up model training time by parallelizing training data as well as sharding model parameters, optimizer states, and gradients across multiple pytorch instances.