HOWTO: Configure the MATLAB Parallel Computing Toolbox

Introduction

The MATLAB Parallel Computing Toolbox and Distributed Computing Server are designed to allow users to create and launch parallel MATLAB jobs on a cluster of compute nodes.  It also allows users to remotely connect to OSC resources, whether to run parallel jobs in MATLAB or to use toolboxes for which users own their own licenses.  This guide will explain the basics of how to configure your Parallel Computing Toolbox for OSC systems.

Versions

The following versions of the MATLAB Parallel Computing Toolbox are supported at OSC:

VERSION GLENN OAKLEY
R2013a   X
R2013b   X
R2014a   X

Usage Overview

When you use the MATLAB Parallel Computing Toolbox, you have a MATLAB client session and one or more MATLAB workers.  The client session may run on your laptop/desktop computer ("remote client") or (for OSU users only) it may run on an OSC login, OnDemand, or compute node.  The MATLAB workers always run on the OSC cluster as part of a batch job.  In the client session you will run MATLAB commands to set up and run a batch job, and MATLAB submits the job for you.

This document describes how to perform a computation on a single worker (one processor) or on multiple workers using a script with a "parfor" loop.

Licensing issues

You must have a license for MATLAB and the MATLAB PCT to allow you to run the MATLAB client session.  OSC has licenses for the MATLAB Distributed Computing Server, which covers the MATLAB workers running on the cluster.

OSC is included in the OSU site license for MATLAB, so OSU users have the option of running the client session on an OSC machine.  You can manage and view your MATLAB licenses using the Mathworks License Center.

Remote client

If you run your MATLAB client session on your local computer, it is considered a remote client.  You will be able to use any toolboxes that you have a license for by submitting batch jobs from your MATLAB client.  The batch jobs you submit from your MATLAB client through the PCT will be able to utilize functions such as "parfor" and "spmd" to run your code in parallel, in addition to being able to run normal MATLAB code.

Client running on Oakley

Because of licensing issues this option applies only to OSU users.  Through the MATLAB installations on Oakley, you will be able to take advantage of the interactive capabilities in the MATLAB PCT, as well as submitting batch jobs through the MATLAB client sessions.  OnDemand or VNC is needed to use this (the MATLAB PCT is GUI only at this time), but it will allow you to debug your parallel jobs using various tools in real time.  Jobs submitted through the client sessions on Oakley behave the same as jobs submitted by qsub, but can utilize the PCT functions such as "parfor" and "spmd".

Performance limitations

Parallel MATLAB on a cluster has a lot of overhead.  It may not give you the speedup you expect, especially if you're solving small problems.  Another consideration is that the MATLAB workers are single-threaded.  They don't take advantage of the multithreading built into many MATLAB functions.

Download and Install the Configuration Files

The first step is to download the necessary configuration files to the computer where you will be running the MATLAB client session.

    Click the link below to download the files.

    OSCMatlabPCT

    OSCMatlabPCT Configuration Resources -- Directory Structure

    The OSC MATLAB PCT Configuration package contains the following files and directories:

    OSCMatlabPCT (top-level directory)

    • config - A directory containing the cluster profiles and configuration functions for using the MATLAB PCT at  OSC.  Note: the only function that users should edit is the "addSubmitArgs" function, as this allows setting some additional batch options, such as walltime, email, and custom job name.  All the other functions have been specially prepared by OSC staff to allow MATLAB to work in harmony with the HPC system on Oakley.
    • launch - A directory containing two scripts, "client_session_script.m" for launching jobs and "reconnect_client_session.m" for reconnecting to a job.  These scripts illustrate key concepts of using the PCT when accessing our system remotely via the MATLAB client on your personal computer.  They also apply to submitting batch jobs from a MATLAB client running in OnDemand.  Both scripts are heavily commented and give usage details beyond what this document can provide.  Note that these scripts contain commands to be run in your client session; do not submit them using the batch command.
    • PCTtestfiles -  A directory containing parfor use-cases. The first, "eigtest", is an example of how to program an extremely simple entry function using parfor.  It simply computes the eigenvalues of multiple large, random matrices.  The second case, is an example of how to run a parallel Simulink simulation using parfor.  The entry function is "paralleltestv2", which calls the function "parsim" inside a parfor loop.  The "parsim" script contains commands that initialize Simulink and run the simulation on each of the parfor workers.

    Import and Configure the Cluster Profile

    These configuration steps need to be done once before you use PCT.  (If you upgrade to a new version of MATLAB you'll have to reconfigure.)  There are also command line options for performing these tasks.

    1. After launching MATLAB, click the "Parallel" dropdown menu from the "Environment" menu and select "Manage Cluster Profiles".  At this time, a new window should open displaying the Cluster Profile Manager.  


    Image of MATLAB Parallel Menu


    2. In the Cluster Profile Manager window, click the "Import" button and locate the "clusterProfiles" directory contained within the OSCMatlabPCT/config/clusterProfiles directory.  

    Cluster profiles are named according to filesystem configuration and version.  Select the file which corresponds to your current version of MATLAB and your filesystem configuration.  If you're running remotely select a "NonShared" version.  If you're running on Oakley, select a "Shared" version.  Click "Open" to proceed.  


    Screen shot 2014-06-06 at 11.14.32 AM.png


    3. If you are using a shared filesystem configuration: No further modifications will need to be made to the cluster profile, and you can close the Cluster Profile Manager and continue with configuring your job. 

    If you are using a non-shared filesystem configuration: You will need to make some changes to your cluster profile before exiting the Cluster Profile Manager.  Click the "Edit" button to enable editing.


    Screen shot 2014-06-06 at 11.36.41 AM.png


    4. In the editing window under "Submit Functions", you should see two entries -- IndependentSubmitFcn and CommunicatingSubmitFcnIntel.  These fields contain cell arrays with three values: a handle to the submit function, the hostname of the cluster, and the remote job storage location where results files and logs will be stored.  You only need to change the remote job storage location.  Use an absolute path to a location in your OSC home directory where you'd like the output of your job (or jobs) to be retained.  


    Image of MATLAB Cluster Profile Properties

    Here is an example of the full syntax:

    {@independentSubmitFcn, 'oakley.osc.edu', '/my/home/dir/MATLAB'}

    {@communicatingSubmitFcn, 'oakley.osc.edu', '/my/home/dir/MATLAB'}

    Note:  If you try to validate your configuration, the last test will always fail for remote clients, even if the configuration is correct.


    5. Include the configuration files in your MATLAB path.  Click on "Set Path", then "Add Folder".  Locate and select the OSCMatlabPCT\config directory.  Click "Select Folder" and "Save".  (Alternatively you can use the "addpath" command in the MATLAB command window.)


    Configure and Run a Batch Job

    The primary way to utilize the MATLAB PCT at OSC is to submit batch jobs from the MATLAB client, either on your local PC or on Oakley via OnDemand.  These instructions assume that you have already configured your MATLAB client as described above.

    IMPORTANT: MATLAB treats the output directory specified with the submit functions as SCRATCH file space, and will cleanup this directory after you have successfully retrieved your data.  However, in practice it has been observed that MATLAB sometimes cleans up this directory even if the commands are unsuccessful (such as in the case of a large file transfer).  To avoid data loss, please make sure your entry function/script copies any important output files to an alternate location for redundancy.  Or simply save your data to a directory of your choice.
    1. Write your script to be run on the cluster.  This can be just a serial script or it may contain parallel commands such as "parfor".  Some example scripts are provided in the directory OSCMatlabPCT/PCTtestfiles.  The example below uses "eigtest.m" as the script to be run.
    2. In your client session:  Using "client_session_script.m" as a guide, run the commands to connect to the cluster and launch a batch job.  We don't recommend that you run the script as written, at least at first.  Until you're familiar with using the PCT, simply copy commands out of the script and paste them into your MATLAB command window as needed.  There are two important functions involved in launching a job.  The command "parcluster" creates a cluster object in the MATLAB workspace, which is used to store information sent between the remote cluster and your MATLAB client.  The command "batch" begins an automated process that can connect to the Oakley cluster, submit a job to PBS, and initialize the MATLAB Distributed Computing Server (MDCS).  For more on the batch command, see the "client_session_script.m" file, and the MATLAB help.
    3. When your job has been successfully configured, you will be prompted for your username and password at OSC.  After entering this, your job will be submitted to the system, and assigned an OSC job ID.  Please note: as explained in the "client_session_script" example, MATLAB remembers your jobs on the cluster by the job index number, not the OSC job ID.  Also, if you are unable to qstat your job immediately after it is submitted, don't worry!  There seems to be some latency between the MATLAB MDCS and the batch system when it comes to exchanging job status info, but this should not affect your ability to track your job.  In fact, as shown in the example, there are several ways to get information about your jobs directly from your MATLAB client. After submission you can check your job's progress or retrieve results at any time (even after closing the MATLAB client), and you can continue using the MATLAB client for other work if you wish.
    4. Wait for your job to complete.  You may use the "wait" command in your client session or simply monitor your job's status with "qstat" on an Oakley login node to know when your job has completed.
    5. Reconnect after closing your MATLAB client session.  The script "reconnect_client_session.m" illustrates how to find your job information again if you restart your MATLAB client session.
    6. Retrieve your results.  The "load" or "fetchOutputs" command retrieves the results of your computations from your MATLAB working directory on Oakley after the job has finished.  They also report errors if your run failed.  "Load" is used for exporting the workspace from an entry script, and "fetchOutputs" is used for retrieving the specified output arguments of an entry function.  For large data, there is an important caveat: MATLAB does not utilize sftp when fetching your files, so large files might take a long time to retrieve.  Additionally, if your output arguments/workspace is over a certain size (around 2GB), using the "load" or "fetchOutputs" commands will give an "index out of range" error caused by the MDCS failing to save the output to file.  This is due to an internal limitation in the default version of MATLAB's ".mat" files.  However, you may work around this limitation by manually saving your workspace or arguments in your entry script, using the '-v7.3' switch in the "save" command.

    Specifying Additional PBS Options for Your Batch Job

    The submit functions are MATLAB functions which serve two primary purposes: to create the appropriate connections to the cluster where your job will run, and to procedurally generate and submit a shell script to the queue on Oakley.  Your nodes and cores (ppn) are determined by the number of MATLAB workers you request in your submit script, however to change or add additional arguments or resource requests you must add the PBS commands to the string in the function "addSubmitArgs" (located in config).  The default arguments in the string are for a walltime of 1:00:00, a job name "default", and emails sent on abort, begin, and end of the job.  This function is called in the submit functions, so advanced users may choose to do what they wish with how this file is called in the submit functions to streamline their workflow if multiple configurations are needed.

    Configure and Run an Interactive Job

    There are a couple useful ways for OSU users to use the interactive features of the MATLAB PCT.  Because of our license terms, this option is available only to OSU faculty, staff, and students.

    Using OnDemand with a Shared Cluster Profile

    OSU users may take advantage of the MATLAB licenses on Oakley to submit interactive "parpool" (R2013b or later) or "matlabpool" (R2013a) jobs the batch system.  These MATLAB PCT functions allow you to run parallel functions such as "parfor" or "spmd" directly from command line or a script, and allow users to debug or interact with parallel jobs in real time.  OSC offers this functionality in MATLAB clients running through the OnDemand portal.  In this method, simply start on Oakley desktop session in OnDemand, open a terminal window, load the correct module, and start MATLAB.  Be sure the OSCMatlabPCT/config directory is in your path.  You will then import the correct shared cluster profile for your version of MATLAB, and use the "parpool" (R2013b or later) or "matlabpool" (R2013a) command to initialize an automatic parallel pool. This might take a while to set up, as the pool request must go throught the batch system like everything else, so be patient!  After a couple minutes or more (depending on the resource request), your pool will connect to the MATLAB workers, and you may begin running parallel commands. This will allow simulation of how jobs will behave when submitted to the MDCS as a batch job.

    Using VNC with the "local" Cluster Profile

    Another way to run an interactive parallel job using the MATLAB PCT is to submit an Interactive Batch job from using qsub, and run MATLAB on a compute node using VNC to utilize the GUI.  You can then import the "local" cluster profile, which runs the workers directly on the node.  Only one node can be used, but workers will run on multiple processors on the node.  This can be used in situations when using OnDemand is not ideal, but users should carefully follow the directions provided so as to not leave VNC processes behind on the node.  Directions for using VNC in a batch job are located here: https://osc.edu/documentation/howto/use-vnc-in-a-batch-job">https://osc.edu/documentation/howto/use-vnc-in-a-batch-job.

    Additional Documentation

    www.mathworks.com/index.html?s_tid=gn_logo">http://www.mathworks.com/index.html?s_tid=gn_logo">Mathworks has written extensive documentation covering all of the public functions within the Parallel Computing Toolbox. The PCT homepage is located at www.mathworks.com/products/parallel-computing/">http://www.mathworks.com/products/parallel-computing/">The Mathworks Parallel Computing Toolbox Homepage.

    For more information about how to construct and run an independent job, see the Mathworks documentation page "www.mathworks.com/help/distcomp/program-independent-jobs.html">http://www.mathworks.com/help/distcomp/program-independent-jobs.html">Program Independent Jobs".  

    For more information about how to construct and run a communicating job, see the Mathworks documentation page "www.mathworks.com/help/distcomp/introduction.html">http://www.mathworks.com/help/distcomp/introduction.html">Program Communicating Jobs".

    Errors

    If you have any errors related to the OSC MATLAB PCT configuration files, or have any other questions about utilizing the MATLAB PCT at OSC please contact OSC Help with: your user id, any relevant error messages, a job ID(s) if applicable, and the version of MATLAB you are using.

    Supercomputer: