"TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code."
Quote from TensorFlow Github documentation.
Availability and Restrictions
Versions
The following version of TensorFlow is available on OSC clusters:
Version | Owens | Pitzer | Note | CUDA version compatibility |
---|---|---|---|---|
1.3.0 | X | python/3.6 | 8 or later | |
1.9.0 | X* | X* | python/3.6-conda5.2 | 9 or later |
2.0.0 | X | X | python/3.7-2019.10 | 10.0 or later |
TensorFlow is a Python package and therefore requires loading corresonding python modules (see Note). The version of TensorFlow may actively change with updates to Anaconda Python on Owens so that you can check the latest version with conda list tensorflow
. The available versions of TensorFlow on Owens and Pitzer require CUDA for GPU calculations. You can find and load compatible cuda module via
module load python/3.6-conda5.2 module spider cuda module load cuda/9.2.88
If you would like to use a different version of TensorFlow, please follow this installation guide which describes how to install python packages locally.
https://www.osc.edu/resources/getting_started/howto/howto_install_tensorflow_locally
Newer version of TensorFlow might require newer version of CUDA. Please refer to https://www.tensorflow.org/install/source#gpu for a up-to-date compatibility chart.
Feel free to contact OSC Help if you have any issues with installation.
Access
TensorFlow is available to all OSC users. If you have any questions, please contact OSC Help.
Publisher/Vendor/Repository and License Type
https://www.tensorflow.org, Open source
Usage on Owens
Usage on Owens
Setup on Owens
TensorFlow package is installed using Anaconda Python 2. To configure the Owens cluster for the use of TensorFlow, use the following commands:
module load python/3.6 cuda/8.0.44
Batch Usage on Ruby or Owens
Batch jobs can request multiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations for Owens, and Scheduling Policies and Limits for more info. In particular, TensorFlow should be run on a GPU-enabled compute node.
An Example of Using TensorFlow with MNIST model and Logistic Regression
Below is an example batch script (job.txt
and logistic_regression_on_mnist.py
) for using TensorFlow.
Contents of job.txt
#!/bin/bash #SBATCH --job-name ExampleJob #SBATCH --nodes=2 --ntasks-per-node=28 --gpus-per-node=1 #SBATCH --time=01:00:00 cd $PBS_O_WORKDIR module load python/3.6 cuda/8.0.44 python logistic_regression_on_mnist.py
Contents of logistic_regression_on_mnist.py
# logistic_regression_on_mnist.py Python script based on: # https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/0_Prerequisite/mnist_dataset_intro.ipynb # https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/2_BasicModels/logistic_regression.ipynb import tensorflow as tf # Import MNIST from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("data/", one_hot=True) # Parameters learning_rate = 0.01 training_epochs = 25 batch_size = 100 display_step = 1 # tf Graph Input x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784 y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes # Set model weights W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) # Construct model pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax # Minimize error using cross entropy cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1)) # Gradient Descent optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) # Initializing the variables init = tf.global_variables_initializer() # Launch the graph with tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size) # Fit training using batch data _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs, y: batch_ys}) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if (epoch+1) % display_step == 0: print ("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost)) print ("Optimization Finished!") # Test model correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # Calculate accuracy for 3000 examples accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print ("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]}))
In order to run it via the batch system, submit the job.txt
file with the following command:
sbatch job.txt
Distributed Tensorflow
Tensorflow can be configured to run parallel using Horovod package from uber.