Using the MATLAB Parallel Computing Toolbox at OSC

The following contains a tutorial on use of the MATLAB® Parallel Computing Toolbox. This tutorial assumes that you have the MATLAB Parallel Computing Toolbox installed on your desktop computer and have configured it for use at OSC. For instructions on configuring the MATLAB Parallel Computing Toolbox for use at OSC, click here.

What is the MATLAB Parallel Computing Toolbox?
As described by The MathWorks, “The MATLAB Parallel Computing Toolbox Parallel Computing Toolbox lets you solve computationally and data-intensive problems using MATLAB® and Simulink® on multicore and multiprocessor computers. Parallel processing constructs such as parallel for-loops and code blocks, distributed arrays, parallel numerical algorithms, and message-passing functions let you implement task- and data-parallel algorithms in MATLAB at a high level without programming for specific hardware and network architectures.” For The MathWorks documentation, see http://www.mathworks.com/products/parallel-computing.

When should I use the MATLAB Parallel Computing Toolbox?
The MATLAB Parallel Computing Toolbox is useful for problems that require solutions to many separate but time-consuming tasks. It is also useful for problems that involve very large amounts of data. The first type of problem is referred to as ”task parallel,” and the second is referred to as “data parallel.” For both types of problems, the MATLAB Parallel Computing Toolbox distributes workload among several MATLAB workers or “labs,” for instance, processors on a desktop machine or computing cluster.

For task parallel problems, there are several constructs available. For example, parfor loops are often mentioned in the documentation as a way to run independent iterations of a for loop in parallel. However, the OSC interface does not currently support matlabpool jobs, which are required for parfor loops. Alternatives include for-drange() loops for parallel iterations of a loop and dfeval() and dfevalasync() for parallel iterations of a function.

For data parallel problems, the MATLAB Parallel Computing Toolbox uses “codistributed arrays,” or arrays that are distributed across multiple labs. Many MATLAB functions are overloaded so that they also operate on codistributed arrays.  For instance, one can add (+), multiply (*), subtract (-), or divide (\) codistributed arrays. Functions such as eig(), svd(), fft(), min(), max(), and find() are also available for codistributed arrays, as well as many others. See “Using MATLAB Functions on Codistributed Arrays” in the MATLAB documentation for a full list of functions that are compatible with codistributed arrays.

The following shows how to use OSC’s interface to the MATLAB Parallel Computing Toolbox, with examples of simple task parallel and data parallel problems.

Setting up the OSC interface
For instructions on setting up the interface between the MATLAB client running on your desktop and the MATLAB client running on OSC’s parallel computing resources, see Configuring Toolbox. In particular, the Parallel Computing Toolbox is configured to run on OSC’s Opteron cluster “Glenn,” at glenn.osc.edu. The following examples will assume that you have completed the required setup and that the resulting configuration is called ‘OSC Opteron.’ You can check your parallel configurations under MATLAB’s ‘Parallel’ menu:

Manage

This will open a ‘Manage Configurations’ menu:

Configurations

This menu should show a configuration called ‘OSC Opteron.’ If it does not, follow the directions at Configuring Toolbox. You can switch between this configuration and the ‘local’ configuration, which is useful for developing and debugging code locally on your desktop.

You must also be connected to glenn.osc.edu. To connect to Glenn, you can issue the command:

conn = ssh('your_OSC_username','glenn.osc.edu')

at the MATLAB prompt. This should return the lines:

Host: glenn.osc.edu
User: your_OSC_username
Connected: 1

A simple task parallel example using dfeval() and dfevalasync()
The functions dfeval() and dfevalasync() can be used to run a single function over multiple sets of inputs in parallel. The function dfeval() is synchronous, meaning it will block your MATLAB session until all instances of the function are evaluated. This makes it easy to tell when dfeval() is finished. The function dfevalasync() is asynchronous, meaning it will not block your MATLAB session while it is running. This allows the MATLAB client to run other commands while the dfevalasync() command is running, which is good if you need to use MATLAB for other things while your function is being evaluated. However, this means the status of the dfevalasync() command must be checked manually.

Take as an example the function meshgrid(). This function generates X and Y arrays for 3-D plots. Let’s say we want to generate arrays using the following pairs of vectors:

x1 = [1 2 3 4 5]; y1 = [1 2 3 4 5];
x2 = [2 4 6 8 10]; y2 = [2 4 6 8 10];
x3 = [0 5 10 15 20]; y3 = [.1 .2 .3 .4 .5];

We can do this in parallel using the function dfeval():

[X, Y] = dfeval(@meshgrid, {{x1 y1}, {x2 y2}, {x3 y3}}, 'Configuration', 'OSC Opteron');

For dfeval(), we store the outputs in two variables, [X, Y]. For the inputs, we specify a function handle for meshgrid(); a cell array of cells, each containing an input argument pair for meshgrid(); and we set ‘Configuration’ to ‘OSC Opteron.’ Note that once we execute this command, the Command Window will enter a ‘Busy’ state while dfeval() runs. When it finishes, the outputs should be cell arrays containing the results. In this case, pairs of outputs can be accessed as:

X{1}; Y{1};
X{2}; Y{2};
X{3}; Y{3};

We can perform the same operation using the function dfevalasync():

job = dfevalasync(@meshgrid, 2, {{x1 y1}, {x2 y2}, {x3,y3}}, 'Configuration', 'OSC Opteron');

Here, we specify as the output a job object that lets us check the state of the job. For the inputs, we specify a function handle for meshgrid(), the number of outputs this function returns for each set of inputs, a cell array of cells listing the input argument pairs for meshgrid(), and we set ‘Configuration’ to ‘OSC Opteron.’ Since dfevalasync() does not put the Command Window into a ‘Busy’ state, we need to check to see when the job is finished. This can be done by periodically running the command:

get(job,'State')

This function returns either ‘queued,’ ‘running,’ or ‘finished.’ Once the function returns as‘finished,’ the outputs can be retrieved using the command:

outputs = getAllOutputArguments(job);

In this case, outputs will be a cell array in which each row corresponds to a given set of inputs, e.g.:

X1 = outputs{1,1};
Y1 = outputs{1,2};
X2 = outputs{2,1};
Y2 = outputs{2,2};

Once the outputs have been retrieved, extra job files that were created by dfevalasync() can be deleted by issuing the command:

destroy(job)

Using pmode
“pmode” is an interactive parallel mode available in the MATLAB client.  This mode is useful for developing more complicated parallel code. Normally, parallel code should be developed and debugged locally on your own desktop before attempting to run it on a larger computing resource. To run pmode on your desktop, issue the command:

pmode start local 4

at the MATLAB prompt. This will open a “Parallel Command Window” with four parallel processes. Note that you do not have to have four processors on your desktop in order to run four parallel processes. The resulting Parallel Command Window is shown below:

Parallel

Here, we have issued two simple commands at the P>> parallel command prompt at the bottom of the Parallel Command Window:

a = 1;
b = rand(1);

A list of issued commands is shown on the left-hand side of the Parallel Command Window. Both commands were evaluated on each of the four processes, referred to as Labs 1 through 4. Results for each individual lab are shown in each window. Note that rand() produces a different random number on each lab.

The following sections provide an introduction to basic parallel programming in pmode. This includes communication, the use of codistributed arrays, and task parallelism using a for-drange() loop. Finally, we show how a program developed locally in pmode can be modified to run on OSC’s systems.

Communication in pmode
The MATLAB Parallel Computing Toolbox has several functions for moving data between labs. These include labSend(), labReceive(), labProbe(), labBroadcast(), and labBarrier(). Supplementary functions include the numlabs() and labindex() functions.

The labSend() function is used to send data from one lab to another, and labReceive() is used to receive this data. The labSend() function is non-blocking, i.e. the lab sending the data can continue to execute commands even if the receiving lab has not yet received the data. The labReceive() command is blocking, i.e. it blocks execution on the receiving lab until the data is received. An optional integer-valued tag can be used with these commands to assign a number to each message. The labProbe() function can be used to check the status of a message from a particular lab or with a particular tag.

The labBroadcast() function is used to send data from one lab to all labs. Each lab is blocked until it receives the data. There is also a labBarrier() function that can be used to block execution of each lab until all labs reach the labBarrier() statement. This function is not often necessary, but can be useful in certain situations.

Use of these functions may require knowing the total number of labs and each lab’s individual number. The numlabs() function returns the total number of labs. The labindex() function returns the lab index.

Here, we will show how to use the labSend(), labReceive(), labBroadcast(), numlabs(), and labindex() functions. A sample script  “Comm_test.m” contains the commands we will be using. First, let us do a simple communication test between labs 1 and 2. To do this, we can paste the following code into the Parallel Command Prompt:

% Generate a random number 'a' on lab 1 and send it to lab 2.
if(labindex == 1)
       a = rand(1)
       labSend(a,2)
end
if(labindex == 2)
       a = labReceive(1)
end

This code is run on all the labs. The first part of this code uses the labindex() function as part of an if statement to run commands on lab 1 only. This portion of the code generates a random value a on lab 1 and uses labSend() to send a to lab 2. The second part of this code uses labindex() as part of an if statement to run commands on lab 2 only. This portion of the code uses a labReceive() function to receive a message from lab 1 on lab 2 and store the result in a. Note that all of these commands must be issued at the Parallel Command Prompt at the same time. Otherwise, the labSend() command fails because there is no corresponding labReceive(). The resulting Parallel Command Window is shown below. As the image shows, labs 1 and 2 both have the same value for a after the labSend() and labReceive() functions have completed. Labs 3 and 4 did not execute either of the commands.

Parallel

Next, we show how to use the labSend() function in a for loop to send a random value stored in b from lab 1 to labs 2 through 4. To do this, we can paste the following code into the Parallel Command Prompt:

% Generate a random number 'b' on lab 1 and sends it to all other labs using a 'for' loop
if(labindex == 1)
    b = rand(1)
    for ii = 2:numlabs
        labSend(b,ii)
    end
else
    b = labReceive(1)
end

The result is shown below. As the image shows, all four labs have the same random value stored in b once the for loop finishes.

Parallel

Finally, we will show how to use the labBroadcast() command to broadcast a random value c1  from lab 1 to all other labs. The result is stored on each lab in the variable c. To do this, we can paste the following code into the Parallel Command Prompt:

% Generate a random number 'c' on lab 1 and broadcast it to all other labs using labBroadcast
if(labindex == 1)
    c1 = rand(1);
    c = labBroadcast(1,c1)
else
    c = labBroadcast(1)
end

The result is shown below. As the image shows, all four labs have the same random value stored in c.

Parallel

For more information on these and similar functions, see “Interlab Communication Within a Parallel Job” in the MATLAB documentation.

Global operations in pmode
The function res = gop(@F, x) is used to perform a global operation  or reduction via the function F of the value x stored on each lab. The function F(x,y) should accept two arguments of the same type and produce a result of the same type so that it can be used iteratively, i.e. F(F(x1,x2),F(x3,x4)) is a valid expression. The function F(x,y) should also be associative, i.e. F(F(x1, x2), x3) = F(x1, F(x2, x3)).

As an example, suppose we generate a random value a on each lab using the command:

a = rand

Each lab will have a different value for a. We can find the minimum value across all labs and return the result to each lab using the command:

res1 = gop(@min, a)

We can add all the values of a across the labs and return the result to lab 2 using the command:

res2 = gop(@plus, a, 2)

The corresponding Parallel Command Window is shown below:

Parallel

Codistributed arrays in pmode
Codistributed arrays are arrays that are stored across more than one lab. These arrays are generated using a codistributor() object that describes how the array should be distributed. Three ways to use codistributor() are:

dist = codistributor()
dist = codistributor('1d', dim)
dist = codistributor('2d', [m n], p)

The first is the default distribution scheme, in which an array is distributed by columns across all labs (unless the array is a column array, in which case the array is distributed by rows). The second allows the user to specify the dimension in which the distribution occurs. The last distributes an array in blocks of size p-by-p, with m*p rows and n*p columns. This applies only to two-dimensional arrays. There are other options for more precisely specifying the desired distribution. See the documentation for codistributor() for more information.

There are several ways to create a codistributed array. For instance, zeros(), rand(), and randn() all work with a codistributor object:

A = zeros(4,8,codistributor())
B = rand(4,3,codistributor('1d',1))

The resulting Parallel Command Window is shown below:

Parallel

Codistributed arrays can also be generated from local arrays on each lab using the codistributed() command. For instance, take the commands:

localA = rand(2,3);
A = codistributed(localA, codistributor('1d', 1))
localB = magic(3);
B = codistributed(localB, codistributor('2d', [2, 2], 3))

The first two commands create an 8-by-3 array distributed by rows. The second two commands create a 6-by-6 array distributed by both rows and columns. The resulting Parallel Command Window is shown below:

Parallel

Task parallelism in pmode using drange()
In pmode, for loops can be parallelized using the drange() function, assuming that each iteration of the loop is independent, i.e. no iteration depends on results from a previous iteration. As a simple example, say we make a 50-by-50 codistributed matrix A of random integers from 0 to 100:

A = round(100*rand(50,100,codistributor()));

Let us suppose that we want to make a matrix B with each entry equal to the largest prime factor of the corresponding entry of A. By default, A is distributed by columns. Thus, we will have each lab process its portion of the entries, moving down its own columns of A using a for-drange() loop. We also need to initialize B before beginning the process:

B = round(zeros(50,100,codistributor()));
for jj=drange(1:size(A,2))
    for ii=1:size(A,1)
       B(ii,jj) = max(factor(A(ii,jj)));
    end
end

We can see a portion of the results by checking a portion of the local parts of A and B:

LA = localPart(A);
LB = localPart(B);
LA(1:2,1:4)
LB(1:2,1:4)

The resulting Parallel Command Window should look something like the one shown below:

Parallel

Note that communication between labs is not allowed in a for-drange() loop, and during the loop, each lab only has access to its own portion of any codistributed arrays. For more information, see “Using a for-Loop Over a Distributed Range (for-drange)” in the MATLAB documentation.

Submitting jobs to Glenn
We can take the code above and use it to test submissions to Glenn. First, Parallel Toolbox submissions require functions, so we should write the above code as a function. See primeFactors.m, where we have put a modified version of the code from the for-drange() loop into the body of the function:

function [OutputA, OutputB] = primeFactors()

The body of the function is:

A = round(100*rand(50,100,codistributor()));
 
B = round(zeros(50,100,codistributor()));
for jj=drange(1:size(A,2))
    for ii=1:size(A,1)
        B(ii,jj) = max(factor(A(ii,jj)));
    end
end
 
OutputA = gather(A);
OutputB = gather(B);
 
if(labindex ~= 1)
    OutputA = 0;
    OutputB = 0;
end

Here, we have changed the outputs slightly. When we submit this code as a job, it will use some specified number of labs. Each lab will return outputs. To avoid repeating the same data, only lab 1 will have actual copies of A and B.

We could submit this using dfeval() or dfevalasync(). We could also do this by creating a scheduler object and submitting a job. First, we must create a scheduler using the command:

sched = findResource('scheduler', 'type', 'generic', 'Configuration', 'OSC Opteron');

Next, we must create a job and set its properties:

job = createParallelJob(sched);
set(job,'FileDependencies',{'C:\Work\primeFactors.m'})
set(job,'PathDependencies',{'C:\Work'})
set(job,'MaximumNumberOfWorkers',4)
set(job,'MinimumNumberOfWorkers',4)

The first line creates a basic job object. The next line specifies the location, on your desktop, of all files needed to run the function. The third line specifies the path, on your desktop, where files needed to run the function are located. The last two lines specify the maximum and minimum number of workers or labs to use. Here, both have been set to 4, so we should get exactly 4 labs. Finally, we create a task that specifies the function we wish to run:

task = createTask(job, @primeFactors, 2, {});

The function createTask() takes at least four arguments: the first is a job object, the second is a handle to the function to run, the third is the number of output arguments this function returns, and the last is a list of input arguments. Note that this list could be made of multiple cell arrays, as with dfeval(). However, here we only want to run one instance of the function primeFactors(), and it does not take any inputs.

To submit the job, we use the command:

submit(job)

To check the state of the job, use the command:

get(job,’State’)

Once this command returns the state ‘finished,’ we can get the outputs and destroy extra job files using the commands:

outputs = getAllOutputArguments(job);

Once we have stored the output, we can destroy any extra job files using the command:

destroy(job);

Each of the four labs returns outputs for the function, so outputs is a 4-by-2 cell array. However, we only really saved the matrices on lab 1, so we can retrieve the results using the command:

OutputA = outputs{1,1};
OutputB = outputs{1,2};

Debugging Jobs
Sometimes, a job may fail to run correctly and getAllOutputArguments() returns an empty cell array. This could be due to a configuration problem or a bug in the code. To get more information on why the job failed, use the following commands to retrieve any error messages:

errmsgs = get(job.Tasks, {'ErrorMessage'});
nonempty = ~cellfun(@isempty, errmsgs);
celldisp(errmsgs(nonempty));

You can also check any logs from the scheduler:

sched = findResource('scheduler', 'type', 'generic', 'Configuration', 'OSC Opteron')
failedjob = findJob(sched, 'State', 'failed');
if(~isempty(failedjob))
message = getDebugLog(sched, failedjob(1))
end

Retrieving Data from Glenn
If you are working on problems involving large data sets, you may not want to continually transfer data between Glenn and your desktop. Instead, you can save the data on Glenn and retrieve it when needed. For instance, let us rewrite a new function, primeFactorsRemote.m that saves the data on Glenn:

function status = primeFactors()
 
status = -1;
 
A = round(100*rand(50,100,codistributor()));
 
B = round(zeros(50,100,codistributor()));
for jj=drange(1:size(A,2))
    for ii=1:size(A,1)
        B(ii,jj) = max(factor(A(ii,jj)));
    end
end
 
localA = localPart(A);
localB = localPart(B);
 
myFileName = ['localPart_' ...
    num2str(labindex) '.mat'];
 
save(myFileName,'localA','localB');
 
status = 0;
 
end

We can run this code and get the results using commands similar to those used before:

sched = findResource('scheduler', 'type', 'generic', 'Configuration', 'OSC Opteron');
job = createParallelJob(sched);
set(job,'FileDependencies',{'C:\Work\primeFactors.m'})
set(job,'PathDependencies',{'C:\Work'})
set(job,'MaximumNumberOfWorkers',4)
set(job,'MinimumNumberOfWorkers',4)
task = createTask(job, @primeFactors, 2, {});
submit(job)
get(job,’State’)
outputs = getAllOutputArguments(job);
destroy(job);

Once the job has completed successfully, outputs will be a cell array full of “1s.” The actual data will be stored on Glenn. To log onto Glenn, you can use a file transfer utility such as the Filezilla client (http://filezilla-project.org). See Configuring Toolbox for instructions on setting up Filezilla on Glenn. Once you have logged in to Filezilla, you can find the files in your home directory (the default) on Glenn, as shown below:

You can also log onto Glenn using an SSH client and manipulate the data there using a variety of available programs. See http://www.osc.edu/supercomputing/computing/opt/index.shtml for further information on general use of Glenn.

Clearing Old Job Data from Glenn and Your Desktop
You may occasionally forget to destroy jobs after they are done. This can eventually cause a large number of files to accumulate on Glenn and your local machine. On both systems, extra job files are stored in a /matlab/jobs folder, for instance, /home/your_OSC_username/matlab/jobs on Glenn. These files can be safely deleted from your desktop, and they can also be deleted from Glenn using Filezilla as discussed previously. There will also be batch files generated in the current MATLAB directory, e.g. “batch_script_17-Aug-2009_13-23-37.” These can also be deleted once the job is finished.

Note that if you remember to destroy the job after you are done, you should not need to delete any files on Glenn.