There are many instances where it is necessary to run the same serial program many times with slightly different input. Parametric runs such as these either end up running in a sequential fashion in a single batch job, or a batch job is submitted for each parameter that is varied (or somewhere in between.) One alternative to this is to allocate a number of nodes/processors to running a large number of serial processes for some period of time. The command parallel-command-processor allows the execution of large number of independent serial processes in parallel. parallel-command-processor works as follows: In a parallel job with N processors allocated, the PCP manager process will read the first N-1 commands in the command stream and distribute them to the other N-1 processors. As processes complete, the PCP manager will read the next one in the stream and send it out to an idle processor core. Once the PCP manager runs out of commands to run, it will wait on the remaining running processes to complete before shutting itself down.
Availability and Restrictions
Parallel-Command-Processor is available for all OSC users.
Publisher/Vendor/Repository and License Type
Ohio Supercomputer Center, Open source
Usage
Here is an interactive batch session that demonstrates the use of parallel-command-processor with a config file, pconf. pconf contains several lines of simple commands, one per line. The output of the commands were redirected to individual files.
-bash-3.2$ sinteractive -A <project-account> -N 2 -n 8 -bash-3.2$ cp pconf $TMPDIR -bash-3.2$ cd $TMPDIR -bash-3.2$ cat pconf ls / > 1 ls $TMPDIR > 2 ls $HOME > 3 ls /usr/local/ > 4 ls /tmp > 5 ls /usr/src > 6 ls /usr/local/src > 7 ls /usr/local/etc > 8 hostname > 9 uname -a > 10 df > 11 -bash-3.2$ module load pcp -bash-3.2$ srun parallel-command-processor pconf -bash-3.2$ pwd /tmp/pbstmp.1371894 -bash-3.2$ srun --ntasks=2 ls -l $TMPDIR 854 total 16 -rw------- 1 yzhang G-3040 1082 Feb 18 16:26 11 -rw------- 1 yzhang G-3040 1770 Feb 18 16:26 4 -rw------- 1 yzhang G-3040 67 Feb 18 16:26 5 -rw------- 1 yzhang G-3040 32 Feb 18 16:26 6 -rw------- 1 yzhang G-3040 0 Feb 18 16:26 7 855 total 28 -rw------- 1 yzhang G-3040 199 Feb 18 16:26 1 -rw------- 1 yzhang G-3040 111 Feb 18 16:26 10 -rw------- 1 yzhang G-3040 12 Feb 18 16:26 2 -rw------- 1 yzhang G-3040 87 Feb 18 16:26 3 -rw------- 1 yzhang G-3040 38 Feb 18 16:26 8 -rw------- 1 yzhang G-3040 20 Feb 18 16:26 9 -rw------- 1 yzhang G-3040 163 Feb 18 16:25 pconf -bash-3.2$ exit
As the command "srun --ntasks=2 ls -l $TMPDIR" shows, the output files are distributed on the two nodes. In a batch file, pbsdcp/sgather can be used to distribute-copy the files to $TMPDIR on all nodes of the job and gather output files once execution has completed. This step is important due to the load that executing many processes in parallel can place on the user home directories.
Here is a slightly more complex example showing the usage of parallel-command-processor and pbsdcp/sgather:
#!/bin/bash #SBATCH --nodes=13 --ntasks-per-node=4 #SBATCH --time=1:00:00 #SBATCH -A <project-account> date module load biosoftw module load blast set -x pbsdcp -s query/query.fsa.* $TMPDIR pbsdcp -s db/rice.* $TMPDIR cd $TMPDIR for i in $(seq 1 49) do cmd="blastall -p blastn -d rice -i query.fsa.$i -o out.$i" echo ${cmd} >> runblast done module load pcp srun parallel-command-processor runblast mkdir $SLURM_SUBMIT_DIR/output sgather -r $TMPDIR $SLURM_SUBMIT_DIR/output date
Further Reading
The parallel-command-processor command is documented as a man page: man parallel-command-processor
.