"Torch is a deep learning framework with wide support for machine learning algorithms. It's open-source, simple to use, and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C / CUDA implementation. Torch offers popular neural network and optimization libraries that are easy to use, yet provide maximum flexibility to build complex neural network topologies. It also runs up to 70% faster on the latest NVIDIA Pascal™ GPUs, so you can now train networks in hours, instead of days."
Quote from Torch documentation.
The following version of Torch is available on OSC cluster:
Version | Owens |
---|---|
7 | X* |
You can use module spider torch
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
The current version of Torch on Owens requires cuda/8.0.44 and CUDNN v5 for GPU calculations.
Torch is available to all OSC users. If you have any questions, please contact OSC Help.
Soumith Chintala, Ronan Collobert, Koray Kavukcuoglu, Clement Farabet/ Open source
To configure the Owens cluster for the use of Torch, use the following commands:
module load torch
Batch jobs can request multiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations for Owens, and Scheduling Policies and Limits for more info. In particular, Torch should be run on a GPU-enabled compute node.
Below is an example batch script (job.txt
) for using Torch. Please see the reference https://github.com/szagoruyko/cifar.torch for more details.
#!/bin/bash #SBATCH --job-name=Torch #SBATCH --nodes=1 --ntasks-per-node=28 --gpus=1 #SBATCH --time=00:30:00 #SBATCH --account <project-account> # Load module load for torch module load torch # Migrate to job temp directory cd $TMPDIR # Clone sample data and scripts git clone https://github.com/szagoruyko/cifar.torch.git . # Run the image preprocessing (not necessary for subsequent runs, just re-use provider.t7) OMP_NUM_THREADS=28 th -i provider.lua <<Input provider = Provider() provider:normalize() torch.save('provider.t7',provider) exit y Input # Run the torch training th train.lua --backend cudnn # Copy results from job temp directory cp -a * $SLURM_SUBMIT_DIR
In order to run it via the batch system, submit the job.txt
file with the following command:
sbatch job.txt