Description
This tutorial guides you through the process of creating and submitting a batch script on one of our compute clusters. This is a linux tutorial which uses batch scripting as an example, not a tutorial on writing batch scripts. The primary goal is not to teach you about batch scripting, but for you to become familiar with certain linux commands. There are other pages on the OSC web site that go into the details of submitting a job with a batch script.
Prerequisites
- Familiarity with a text editor (emacs, nano, vim)
- Basic understanding of the unix command line
- Linux Command Line Fundamentals tutorial
Goals
- Create subdirectories to organize information
- Create a batch script with a text editor
- Submit a job
- Check on the progress of the job
- Change the permissions of the output files
- Get familiar with some common unix commands
Step 1 - Organize your directories
When you first log in to our clusters, you are in your home directory. For the purposes of this illustration, we will pretend you are user osu0001 and your project code is PRJ0001, but when you try out commands you must use your own username and project code.
$ pwd /users/PRJ0001/osu0001
$ touch foo1 $ touch foo2 $ ls $ ls -l $ ls -lt $ ls -ltr
touch
" command just creates an empty file with the name you give it.-l
", "-lt
" or "-ltr
"? You noticed the difference in the output between just the "ls
" command and the "ls -l
" command.-
" (minus sign) followed by a single letter. "ls -ltr
" is actually specifying three options to the ls
command.l
: I want to see the output in long format -- one file per line with some interesting information about each filet
: sort the display of files by when they were last modified, most-recently modified firstr
: reverse the order of display (combined with -t this displays the most-recently modified file last -- it should be BatchTutorial in this case.)ls -ltr
" because I find it convenient to see the most recently modified file at the end of the list.$ mkdir BatchTutorial $ ls -ltr
mkdir
" command makes a new directory with the name you give it. This is a subfolder of the current working directory. The current working directory is where your current focus is in the hierarchy of directories. The 'pwd
' command shows you are in your home directory:$ pwd /users/PRJ0001/osu0001
$ cd BatchTutorial $ pwd
pwd
' now? "cd
" is short for "change directory" -- think of it as moving you into a different place in the hierarchy of directories. Now do$ cd .. $ pwd
Step 2 -- Get familiar with some more unix commands
Try the following:
$ echo where am I? $ echo I am in `pwd` $ echo my home directory is $HOME $ echo HOME $ echo this directory contains `ls -l`
These examples show what the echo
command does and how to do some interesting things with it. The `pwd`
means the result of issuing the command pwd. HOME is an example of an environment variable. These are strings that stand for other strings. HOME is defined when you log in to a unix system. $HOME
means the string the variable HOME stands for. Notice that the result of "echo HOME
" does not do the substitution. Also notice that the last example shows things don't always get formatted the way you would like.
Some more commands to try:
$ cal $ cal > foo3 $ cat foo3 $ whoami $ date
Using the ">
" after a command puts the output of the command into a file with the name you specify. The "cat
" command prints the contents of a file to the screen.
Two very important UNIX commands are the cp
and mv
commands. Assume you have a file called foo3 in your current directory created by the "cal > foo3
" command. Suppose you want to make a copy of foo3 called foo4. You would do this with the following command:
$ cp foo3 foo4 $ ls -ltr
Now suppose you want to rename the file 'foo4' to 'foo5'. You do this with:
$ mv foo4 foo5 $ ls -ltr
'mv
' is short for 'move' and it is used for renaming files. It can also be used to move a file to a different directory.
$ mkdir CalDir $ mv foo5 CalDir $ ls $ ls CalDir
Notice that if you give a directory with the "ls
" command is shows you what is in that directory rather than the current working directory.
Now try the following:
$ ls CalDir $ cd CalDir $ ls $ cd .. $ cp foo3 CalDir $ ls CalDir
Notice that you can use the "cp
" command to copy a file to a different directory -- the copy will have the same name as the original file. What if you forget to do the mkdir
first?
$ cp foo3 FooDir
Now what happens when you do the following:
$ ls FooDir $ cd FooDir $ cat CalDir $ cat FooDir $ ls -ltr
CalDir is a directory, but FooDir is a regular file. You can tell this by the "d" that shows up in the string of letters when you do the "ls -ltr
". That's what happens when you try to cp or mv a file to a directory that doesn't exist -- a file gets created with the target name. You can imagine a scenario in which you run a program and want to copy the resulting files to a directory called Output but you forget to create the directory first -- this is a fairly common mistake.
Step 3 -- Environment Variables
Before we move on to creating a batch script, you need to know more about environment variables. An environment variable is a word that stands for some other text. We have already seen an example of this with the variable HOME. Try this:
$ MY_ENV_VAR="something I would rather not type over and over" $ echo MY_ENV_VAR $ echo $MY_ENV_VAR $ echo "MY_ENV_VAR stands for $MY_ENV_VAR"
You define an environment variable by assigning some text to it with the equals sign. That's what the first line above does. When you use '$
' followed by the name of your environment variable in a command line, UNIX makes the substitution. If you forget the '$
' the substitution will not be made.
There are some environment variables that come pre-defined when you log in. Try using 'echo
' to see the values of the following variables: HOME, HOSTNAME, SHELL, TERM, PATH.
Now you are ready to use some of this unix knowledge to create and run a script.
Step 4 -- Create and run a script
Before we create a batch script and submit it to a compute node, we will do something a bit simpler. We will create a regular script file that will be run on the login node. A script is just a file that consists of unix commands that will run when you execute the script file. It is a way of gathering together a bunch of commands that you want to execute all at once. You can do some very powerful things with scripting to automate tasks that are tedious to do by hand, but we are just going to create a script that contains a few commands we could easily type in. This is to help you understand what is happening when you submit a batch script to run on a compute node.
Use a text editor to create a file named "tutorial.sh" which contains the following text (note that with emacs or nano you can use the mouse to select text and then paste it into the editor with the middle mouse button):
$ nano tutorial.sh
echo ---- echo Job started at `date` echo ---- echo This job is working on node `hostname` SH_WORKDIR=`pwd` echo working directory is $SH_WORKDIR echo ---- echo The contents of $SH_WORKDIR ls -ltr echo echo ---- echo echo creating a file in SH_WORKDIR whoami > whoami-sh-workdir SH_TMPDIR=${SH_WORKDIR}/sh-temp mkdir $SH_TMPDIR cd $SH_TMPDIR echo ---- echo TMPDIR IS `pwd` echo ---- echo wait for 12 seconds sleep 12 echo ---- echo creating a file in SH_TMPDIR whoami > whoami-sh-tmpdir # copy the file back to the output subdirectory cp ${SH_TMPDIR}/whoami-sh-tmpdir ${SH_WORKDIR}/output cd $SH_WORKDIR echo ---- echo Job ended at `date`
To run it:
$ chmod u+x tutorial.sh $ ./tutorial.sh
Look at the output created on the screen and the changes in your directory to see what the script did.
Step 5 -- Create and run a batch job
Use your favorite text editor to create a file called tutorial.pbs in the BatchTutorial directory which has the following contents (remember, you can use the mouse to cut and paste text):
#PBS -l walltime=00:02:00 #PBS -l nodes=1:ppn=1 #PBS -N foobar #PBS -j oe #PBS -r n echo ---- echo Job started at `date` echo ---- echo This job is working on compute node `cat $PBS_NODEFILE` cd $PBS_O_WORKDIR echo show what PBS_O_WORKDIR is echo PBS_O_WORKDIR IS `pwd` echo ---- echo The contents of PBS_O_WORKDIR: ls -ltr echo echo ---- echo echo creating a file in PBS_O_WORKDIR whoami > whoami-pbs-o-workdir cd $TMPDIR echo ---- echo TMPDIR IS `pwd` echo ---- echo wait for 42 seconds sleep 42 echo ---- echo creating a file in TMPDIR whoami > whoami-tmpdir # copy the file back to the output subdirectory pbsdcp -g $TMPDIR/whoami-tmpdir $PBS_O_WORKDIR/output echo ---- echo Job ended at `date`
$ qsub tutorial.pbs
qstat -u [username]
to check on the progress of your job. If you see something like this$ qstat -u osu0001 Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------------ ----------- -------- ---------------- ------ ----- ------ ------ ----- - ----- 458842.oak-batch osu0001 serial foobar -- 1 1 -- 00:02 Q --
If you see something like this: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------------ ----------- -------- ---------------- ------ ----- ------ ------ ----- - ----- 458842.oak-batch osu0001 serial foobar 26276 1 1 -- 00:02 R --
qstat
command is empty, the job is done.$ ls -ltr $ cat foobar.oNNNNNN
Where (NNNNNN is your job id).
- The name you give the job in the script file with the header line #PBS -N foobar
- The job id number assigned to the job.
The name of the script file (tutorial.pbs) has nothing to do with the name of the output file.
Examine the contents of the output file foobar.oNNNNNN carefully. You should be able to see the results of some of the commands you put in tutorial.pbs. It also shows you the values of the variables PBS_NODEFILE, PBS_O_WORKDIR and TMPDIR. These variables exist only while your job is running. Try
$ echo $PBS_O_WORKDIR
and you will see it is no longer defined. $PBS_NODEFILE
is a file which contains a list of all the nodes your job is running on. Because this script has the line
#PBS -l nodes=1:ppn=1
the contents of $PBS_NODEFILE
is the name of a single compute node.
Notice that $TMPDIR
is /tmp/pbstmp.NNNNNN (again, NNNNNN is the id number for this job.) Try
$ ls /tmp/pbstmp.NNNNNN
Why doesn't this directory exist? Because it is a directory on the compute node, not on the login node. Each machine in the cluster has its own /tmp directory and they do not contain the same files and subdirectories. The /users directories are shared by all the nodes (login or compute) but each node has its own /tmp directory (as well as other unshared directories.)