CUDA

CUDA™ (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by Nvidia that enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Availability and Compatability

CUDA is available on Oakley and Glenn Clusters. The versions currently available at OSC are

Version Glenn Oakley
2.3 X  
3.0 X  
3.1 X  
4.0 X  
4.1.28   X
4.2.9   X
5.0.35   X
5.5 X X

Usage

Access

CUDA is available for use by all OSC users.

Setup

Use module avail to view available modules for a given machine. To load the appropriate CUDA module, type: module load software-name.
For example: To select CUDA version 4.1.28 on Oakley, type: module load cuda/4.1.28

GPU Computing SDK

The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute. On Oakley, the SDK has been installed in $CUDA_HOME (an environment variable set when you load the module).

Programming in CUDA

Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on Optimizing CUDA codes to obtain greater SpeedUp. One can also refer to the following webpage for some more CUDA optimization techniques, http://www.cs.berkeley.edu/~volkov/

Compiling CUDA Code

One can type module show cuda/version-number to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu, the following could be done after loading the appropriate CUDA module:
nvcc -o mycudaApp mycudaApp.cu
This will create an executable by name mycudaApp

Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.

Debugging CUDA code

cuda-gdb can be used to debug CUDA codes. module load cuda will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.

Detecting memory access errors

CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.

Batch Usage

Following are the sample batch scripts for requesting GPU nodes on Glenn and Oakley. Notice that only the second line is different in the two batch scripts. In case of Oakley one can specify the number of GPUs required.

Sample Batch Script (Glenn)

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=8:gpu
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

Sample Batch Script (Oakley)

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1:gpus=1
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

For an interactive batch session one can run the following command:
On Glenn
qsub -I -l nodes=1:ppn=8:gpu -l walltime=00:20:00

On Oakley
qsub -I -l nodes=1:ppn=1:gpus=1 -l walltime=00:20:00

Please note that on Oakley, you can request any mix of ppn and gpus you need; please see the Job Scripts page in our batch guide for more information.

Further Reading

Online documentation is available at http://developer.nvidia.com/nvidia-gpu-computing-documentation

See Also

Supercomputer: 
Service: 
Technologies: 
Fields of Science: