Software List

Ohio Supercomputer Center (OSC) has a variety of software applications to support all aspects of scientific research. We are actively updating this documentation to ensure it matches the state of the supercomputers. This page is currently missing some content; use module avail on each system for a comprehensive list of available software.

Supercomputer: 
Service: 

ABAQUS

ABAQUS is a finite element analysis program owned and supported by SIMULIA, the Dassault Systèmes brand for Realistic Simulation.

Availability & Restrictions

ABAQUS is available on the Oakley and Glenn clusters.

The versions currently available at OSC are

Version Glenn Oakley
6.8-2 X  
6.8-4 X X
6.9-2 X  
6.11-1 X X
6.11-2   X
6.12-pr3   X

The available programs are ABAQUS/CAE, ABAQUS/Standard and ABAQUS/Explicit. Currently, 6.12-pr3 is not available for academic users.

Usage

Access

Use of ABAQUS for academic purposes requires validation. To obtain validation please complete and return the "Academic Agreement to Use ABAQUS,"  located in the Academic Agreement Forms Page

Setup

Use module avail abaqus to view available ABAQUS modules for a given machine. To select a particular software version, type: module load software-name. For example: To select ABAQUS version 6.11-2 on Oakley, type: module load abaqus/6.11-2

Using ABAQUS

Example input data files are available with the Abaqus release. The abaqus fetch utility is used to extract these input files for use. For example, to fetch input file for one of the sample problems including 4 input files, type:

abaqus fetch job=knee_bolster

abaqus fetch job=knee_bolster_ef1

abaqus fetch job=knee_bolster_ef2

abaqus fetch job=knee_bolster_ef3

Token Usage

ABAQUS software usage is monitored though a token-based license manager.  This means very time you run a ABAQUS job, tokens are checked out from our pool for your tasks usage.  To ensure your job is only started when its required ABAQUS tokens are available it is important to include a software flag within your job script's PBS directives.  A minimum of 5 tokens are required per a job, so a 1 node, 1 processor ABAQUS job would need the following PBS software flag: #PBS -l software=abaqus+5. Jobs requiring more cores will need to request more tokens as calculated with the formula  M = int(5 x N^0.422), where N is the total number of cores.  For common requests, you can refer to the following table:

         Cores            (nodes x ppn each):

1 2 3 4 6 8 12 16 24 32 48
Tokens needed: 5 6 7 8 10 12 14 16 19 21 25

Batch Usage

When you log into oakley.osc.edu or glenn.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the 4000+ processors in the computing environment, you must submit your ABAQUS analysis to the batch system for execution.

Continuing with the above example, assume that you have fetched the four input files above into your work directory (where you submit your job, represented by $PBS_O_WORKDIR). A batch script can be created and submitted for a serial or parallel run.

Sample Batch Script for Serial Jobs

#PBS -N knee
#PBS -l walltime=1:00:00
#PBS -l nodes=1:ppn=1
#PBS -l software=abaqus+5
#PBS -j oe
#
# The following lines set up the ABAQUS environment
#
module load abaqus
#
# Move to the directory where the job was submitted
#
cd $PBS_O_WORKDIR
cp *.inp $TMPDIR
cd $TMPDIR
#
# Run ABAQUS
#
abaqus job=knee_bolster interactive
#
# Now, copy data (or move) back once the simulation has completed
#
cp * $PBS_O_WORKDIR

NOTE:

  • To select a particular ABAQUS version, use module load version-name.
  • Make sure to copy all the files needed (input files, restart files, user subroutines, python scripts etc.) from your work directory ($PBS_O_WORKDIR) to $TMPDIR, and copy your results back at the end of your script. Running your job on $TMPDIR ensures maximum efficiency.
  • The keyword interactive is required in the execution line abaqus job=knee_bolster interactive for the following reason: If left off, ABAQUS will background the simulation process. Backgrounding a process in the OSC environment will place it outside of the batch job and it will receive the default 1 hour of CPU time and corresponding default memory limits. The keyword interactive in this case simply tells ABAQUS not to return until the simulation has completed.
  • The name of the input file is sometimes omitted in the execution line, which may work fine if you've copied only the input files for one specific model. Although, it is better practice to designate the main input file explicitly by adding input=<my_input_file_name>.inp to the execution line so that it looks like abaqus job=knee_bolster input=<my_input_file_name>.inp interactive
  • Define nodes=1 (1<=ppn<=8 for Glenn and ppn=12 for Oakley) for a serial run.
  • If ppn>1, add cpus=<n> to the execution line, where n=ppn. This is how it should look like: abaqus job=test input=<my_input_file_name1>.inp cpus=<n> interactive

Sample Batch Script for Parallel Jobs

The following is an example for running ABAQUS software in Parallel

#PBS -l walltime=1:00:00
#PBS -l nodes=2:ppn=12
#PBS -N my_abaqus_job
#PBS -l software=abaqus+19
#PBS -j oe
#
# The following lines set up the ABAQUS environment
#
module load abaqus
#
# Move to the directory where the job was submitted
#
cd $PBS_O_WORKDIR
cp *.inp $TMPDIR/
cd $TMPDIR
#
# Run ABAQUS, note that in this case we have provided the names of the input files explicitly
#
abaqus job=test input=<my_input_file_name1>.inp cpus=24 interactive
#
# Now, move data back once the simulation has completed
#
mv * $PBS_O_WORKDIR

NOTE:

  • Define nodes>1 for a parallel run. 1<=ppn<=8 for Glenn and ppn=12 for Oakley.
  • Specify cpus=<n> in the execution line, where n=nodes*ppn.
  • Everything else is similar to the serial script above.

Further Reading

Online documentation is available at: http://www.osc.edu/supercomputing/manuals.

See Also

Supercomputer: 
Service: 

ACML - AMD Core Math Library

The AMD Core Math Library (ACML) is a set of numerical routines tuned specifically for AMD64 platform processors (including Opteron and Athlon64 ). The routines, which are available via both FORTRAN 77 and C interfaces, include BLAS, LAPACK, FFT, RND, and many others.

Availability & Restrictions

OSC supports use of ACML on the Glenn cluster.

VERSION GLENN OAKLEY
3.6.0 X  
3.6.1 X  
4.0.1 X  
4.3.0 X  
4.4.0 X  
There are no restrictions on the use of the ACML library.

Usage

Set-up

To configure your environment for use of ACML load the “acml” module appropriate for your compiler:

Compiler Module Command
gFORTRAN module load acml-gfortran
Intel module load acml-intel
Portland Group module load acml-pgi
Portland Group, Multi-Core module load acml-pgimp

This step is required for both building and running ACML applications. The default version is 4.0.1.

Building With ACML

The Glenn ACML modules will automatically configure your environment to locate appropriate include files and libraries. When the ACML modules are loaded the following environment variables will be set:

Variable Use
$OMP_NUM_THREADS Number of OpenMP threads to be used. Set to 1 in all ACML modules except acml-pgimp, which sets it to 4. Should be set by the user AFTER loading the module if a different value is needed.
$ACML_CFLAGS Compiler flags: include path to be used
$ACML Linker flags: libraries to be used

Usage with the Portland Group Compiler, C code

To build a sample C-code that uses the ACML library with the Portland Group Compiler on Glenn, follow the example steps below after logging into the system:

$ cp ~support/examples/ACML/example.c .
$ module load acml-pgi
$ pgcc $ACML_CFLAGS $ACML -lm -lpgftnrtl –lrt example.c
$ ./a.out

Usage with the Portland Group Compiler, FORTRAN code

To build a sample FORTRAN-code that uses the ACML library with the Portland Group Compiler on Glenn, follow the example steps below after logging into the system:

$ cp ~support/examples/ACML/example.f .
$ module load acml-pgi
$ pgf77 example.f $ACML
$ ./a.out

Further Reading

See Also

  • Intel MKL
  • Armstrong Libraries and Compilers Group
Tag: 
Supercomputer: 
Service: 

AMBER

The Assisted Model Building with Energy Refinement (AMBER) package contains many molecular simulation programs targeted at biomolecular systems. A wide variety of modelling techniques are available. It generally scales well on modest numbers of processors, and the GPU enabled programs are very efficient.

Availability & Restrictions

Amber is available to all OSC users once they have completed a copy of the Academic Agreement form

Version Glenn Oakley
9 X  
10 X  
11 X X
12 X X

Usage

Set-up

Initalizing Amber on both Glenn and Oakley is done by loading an amber module:

module load amber

To see other available versions, run the following command:

module avail

Using AMBER

To execute a serial Amber program interactively, simply run it on the command line, e.g.:

tleap

Parallel Amber programs should be run in a batch environment with mpiexec, e.g.:

mpiexec pmemd.MPI

Batch Usage

Sample batch scripts and Amber input files are available here:

/nfs/10/srb/workshops/compchem/amber/

This simple batch script for either Glenn or Oakley demonstrates some important points:

# AMBER Example Batch Script for the Basic Tutorial in the Amber manual
#PBS -N 6pti
#PBS -l nodes=1:ppn=1
#PBS -l walltime=0:20:00

module load amber
# Use TMPDIR for best performance.
cd $TMPDIR
# PBS_O_WORKDIR refers to the directory from which the job was submitted.
cp -p $PBS_O_WORKDIR/6pti.prmtop .
cp -p $PBS_O_WORKDIR/6pti.prmcrd .
# Running minimization for BPTI
cat << eof > min.in
# 200 steps of minimization, generalized Born solvent model
&cntrl
maxcyc=200, imin=1, cut=12.0, igb=1, ntb=0, ntpr=10,
/
eof
sander -i min.in -o 6pti.min1.out -p 6pti.prmtop -c 6pti.prmcrd -r 6pti.min1.xyz
cp -p min.in 6pti.min1.out 6pti.min1.xyz $PBS_O_WORKDIR

Further Reading

 

Supercomputer: 
Service: 

ANSYS

ANSYS is an engineering package and support routine for general-purpose finite-element analysis: statics, heat transfer, mode frequency, stability analysis, magnetostatics, coupled field analysis, etc. Supports are provided by ANSYS, Inc.

Availability & Restrictions

OSC has an "Academic Teaching Advanced " license for ANSYS. This allows for academic use of the software by Ohio faculty and students, with some restrictions. To view current ANSYS node restrictions, please see ANSYS's Terms of Use.

For academic users not on the OSU campus, OSC must pay an additional fee to Ansys to validate their use. For this reason, such new users must have their department or PI confirm their funding for ANSYS licenses.

The following versions of ANSYS are available at OSC:
Version Glenn Oakley
11.0 X  
11.0  update X  
13.0 X X
14.0   X*
14.5 X X

* Commercial users should use the "ndem" module.

Usage

Set-up

Use of ANSYS for academic purposes requires validation. To obtain validation please complete and return "Academic Agreement to Use ANSYS." This can be obtained from your site consultant or from the files ansys.pdfansys.ps, oransys.txt located in the Academic Agreement Forms.

ANSYS supports X11 Windows and 3-D devices. PostScript and HP-GL output are available through the DISPLAY program.

Using ANSYS

The ANSYS commands and utility programs (see below) are located in your execution path. ANSYS is normally started with a module load command for the specific version, e.g. ansys110; or use only ansys for the default version (ansys110 on Glenn).

module load ansys

Following a successful loading of the ANSYS module, you can access the ansys commands:

ansys <switch options> <file>

The ansys command takes a number of Unix-style switches and parameters.

The -j Switch

The ansys command accepts a -j switch. It specifies the "job id," which determines the naming of output files. The default is the name of the input file.

The -d Switch

The ansys command accepts a -d switch. It specifies the device type. The value can be X11, x11, X11C, x11c, or 3D.

The -m Switch

The ansys command accepts a -m switch. It specifies the amount of working storage obtained from the system. The units are megawords.

The memory requirement for the entire execution will be approximately 5300000 words more than the -m specification. This is calculated for you if you use ansnqs to construct an NQS request.

The -b [nolist] Switch

The ansys command accepts a -b switch. It specifies that no user input is expected (batch execution).

The -s [noread] Switch

The ansys command accepts a -s switch. By default, the start-up file is read during an interactive session and not read during batch execution. These defaults may be changed with the -s command line argument. The noread option of the -s argument specifies that the start-up file is not to be read, even during an interactive session. Conversely, the -s argument with the -b batch argument forces the reading of the start-up file during batch execution.

The -g [off] Switch

The ansys command accepts a -g switch. It specifies that the ANSYS graphical user interface started automatically.

ANSYS parameters

ANSYS parameters may be assigned values on the ansys command. The parameter must be at least two characters long and must be a legal parameter name. The ANSYS parameter that is to be assigned a value should be given on the command line with a preceding dash (-), a space immediately after, and the value immediately after the space:

module load ansys
ansys -pval1 -10.2 -EEE .1e6
sets pval1 to -10.2 and EEE to 100000

  

OSC Batch Usage

ANSYS can be run on OSC systems in interactive mode or in batch mode. Interactive mode is similar to running ANSYS on a desktop machine in that the graphical user interface will be sent from OSC and displayed on the local machine. Batch mode means that you submit the ANSYS job with a batch script by providing all the ANSYS input file(s) needed to run the simulation when resources become available.

Interactive jobs are run on compute nodes of the cluster, by turning on X11 forwarding. You can submit a simple one processor interactive batch job with X11 forwarding with the command qsub -I -X. The intention is that users can run ANSYS interactively for the purpose of building their model and preparing their input file. Once developed this input file can then be run in batch mode.

Batch jobs can request mutiple nodes/cores and long compute time, up to the limits of the OSC systems (refer to the computing pages). Batch jobs run on the compute nodes of the system and not on the login node. It is desirable for big problems since more resources can be used.

Interactive Example

To run interactive ANSYS, a batch job need to be submitted from the login node, to request necessary compute resources, with X11 forwarding. For example, the following line requests one node (the default), one core, for a walltime of one hour, with one ANSYS license.

qsub -I -X -l walltime=1:00:00 -l software=ansys+1

This job will queue until resources becomes available. Once the job is started, you're automatically logged in on the compute node; and you can launch ANSYS and start the graphic interface with the following commands:

module load ansys
ansys -g

Batch Mode

For a given model, prepare the input file with ANSYS commands (named ansys.in for example) for the batch run. Assume the solution will need 30 hours and 1 processor. The following batch script would be needed for the serial applicaiton:

#PBS -N ansys_test  
#PBS -l walltime=30:00:00  
#PBS -l nodes=1:ppn=1  
#PBS -j oe
cd $TMPDIR  
cp $PBS_O_WORKDIR/ansys.in .    
module load ansys  
ansys < ansys.in   
cp * $PBS_O_WORKDIR

To run this job on OSC batch system, the above script (named submit_ansys.job) is to be submitted with the command:

qsub submit_ansys.job

To take advantage of the powerful compute resources at OSC, you may choose to run distributed ANSYS for large problems. Multiple nodes and cores can be requested to accelerate the solution time. Note that you'll need to change your batch script slightly for distributed runs.

For distributed ANSYS jobs using one node (nodes=1), the number of processors needs to be specified in the command line with options '-dis -np':

#PBS -N ansys_test 
#PBS -l walltime=3:00:00 
#PBS -l nodes=1:ppn=8
#PBS -W x=GRES:ansys+1%ansyspar+4
...
ansys -dis -np 8 < ansys.in  
...

Notice that in the script above, the ansys parallel license is requested as well as ansys license in the format of

#PBS -W x=GRES:ansys+1%ansyspar+n

where n=m-4, with m being the total cpus called for this job. This line is necessary when the total cpus called is greater than 4 (m>4), which applies for the parallel example below as well.

For distributed jobs requesting multiple nodes, you need to specify the number of processors for each node in the command line. This information can be obtained from $PBS_NODEFILE. The following shows changes in the batch script if 2 nodes on Glenn are requested for a parallel ansys job:

#PBS -N ansys_test 
#PBS -l walltime=3:00:00 
#PBS -l nodes=2:ppn=8
#PBS -W x=GRES:ansys+1%ansyspar+12
...
export MPI_WORKDIR=$PWD
machines=`uniq -c ${PBS_NODEFILE} | awk '{print $2 ":" $1}' | paste -s -d ':'`
ansys -dis -machines $machines < ansys.in  
...
pbsdcp -g '*' $PBS_O_WORKDIR

The 'pbsdcp -g' command in the last line in the script above makes sure that all result files generated by different compute nodes are copied back to the work directory.

Information on how to monitor the job can be found in the computing environments section.

 

Further Reading

 

See Also

 

Supercomputer: 
Service: 
Fields of Science: 

Altair HyperWorks

HyperWorks is a high-performance, comprehensive toolbox of CAE software for engineering design and simulation.

Availability & Restrictions

HyperWorks is available to all OSC users without restriction.

The following version of Altair Hyperworks can be found for the following environments:

Version Glenn Oakley Statewide
10.0 X   X
11.0   X X

 

Note:  In order to use the statewide license for Altair Hyperworks, you must have version 10.1 or 11.0 installed.  Versions 12.0 and up are not supported.

Usage

Set-up

To use HyperWorks on the Glenn cluster, first ensure that X11 forwarding is enabled, as the HyperWorks workbench is a graphical application. Then, load the hyperworks module:

module load hyperworks

For information on downloading and installing a local copy through the state-wide license, follow the steps below. NOTE: To run Altair HyperWorks, your computer must have access to the internet. The software contacts the license server at OSC to check out a license when it starts and periodically during execution. The amount of data transferred is small, so network connections over modems are acceptable.

  1. Go to http://www.altairhyperworks.com/ 

  2. Click on "Login" in the upper right hand corner of the page. 

  3. If you have already registered with the Altair web site, enter the e-mail address that you registered with and your password and skip to step #5.

  4. If you have not registered yet, click the link that says "Click here to register now". You will be prompted for some contact information and an e-mail address which will be your unique identifier.

    • IMPORTANT: The e-mail address you give must be from your academic institution. Under the statewide license agreement, registration from Ohio universities is allowed on the Altair web site. Trying to log in with a yahoo or hotmail e-mail account will not work. If you enter your university e-mail and the system will not register you, please contact OSChelp at oschelp@osc.edu.

  5. Once you have logged in, click on "SUPPORT" and then "SOFTWARE DOWNLOADS"

  6. In addition to downloading the software, download the "Installation Guide and Release Notes" for instructions on how to install the software.

    • IMPORTANT: If you have any questions or problems, please contact OSChelp atoschelp@osc.edu or OSChelp at oschelp@osc.edu, rather than HyperWorks support. The software agreements outlines that problems should first be sent to OSC. If the OSC support line cannot answer or resolve the question, they have the ability to raise the problem to Altair support.

  7. ​Go to the web page: http://www.osc.edu/supercomputing/software/general.shtml . Download the HyperWorks form from the software list, fill out and fax to the number given on the form.  License server information will be provided to you at this time.

Using HyperWorks

To use HyperWorks on Glenn after performing the initial setup, simply type: hw

Usage of HyperWorks on a local machine using the statewide license will vary from installation to installation.

Further Reading

For more information about HyperWorks, see the following:

See Also

Supercomputer: 
Service: 

Bioperl

Bioperl offers a set of perl modules which can be used for sequence manipulation. Knowledge of PERL programming is required.

Availability & Restrictions

Bioperl is available without restriction to all OSC users.

The following versions of Bioperl are available:

Version Glenn Oakley
1.5.1 X  
1.6.1 X X

Usage

Using Bioperl

This is an example of how to use bioperl and access the sequence package on the Oakley Cluster.

use lib '/usr/local/bioperl/1.6.1/lib/perl5/'
use Bio::Seq;

This is an example of how to use bioperl and access the sequence package on the Glenn Cluster.

use lib '/usr/local/biosoftw/bioperl-1.6.1/lib/perl5/'
use Bio::Seq;

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

BLAST

The BLAST programs are widely used tools for searching DNA and protein databases for sequence similarity to identify homologs to a query sequence. While often referred to as just "BLAST", this can really be thought of as a set of programs: blastp, blastn, blastx, tblastn, and tblastx.

Availability & Restrictions

BLAST is available without restriction to all OSC users.

The following versions of BLAST are available on OSC systems:

Version Glenn Oakley
2.2.17 X  
2.2.23+ X  
2.2.24+ X X
2.2.25+ X X
2.2.26   X

 

If you need to use blastx, you will need to load one of the C++ implimenations modules of blast (any version with a "+").

Usage

Set-up

Setting up BLAST for usage depends on the system you are using. On Glenn, load the biosoftware module followed by the BLAST specific module:

module load biosoftw
module load blast

Then create a resource file .ncbirc, and put it under your home directory.
If you are using the legacy blast program, the contents of the file contains at least two variables DATA and BLASTDB:

[NCBI]
DATA="/usr/local/biosoftw/blast-2.2.17/data/"
[BLAST]
BLASTDB="/nfs/proj01/PZS0002/biosoftw/db/"

If you are using the C++ implementation of blast program, the contents of the file contains at least one variable BLASTDB:

[BLAST]
BLASTDB="/nfs/proj01/PZS0002/biosoftw/db/"

On Oakley, just load the BLAST specific module:

module load blast

The resource file .ncbirc under home directory should contain the following two lines:

[BLAST]
BLASTDB="/nfs/proj01/PZS0002/biosoftw/db/"

Upon start, BLAST  will read this file to get the path information it needs during BLAST searches. Without this file, BLAST will search the working directory, or whichever directory the command is issued from.

Using BLAST

The five flavors of BLAST mentioned above perform the following tasks:

blastp: compares an amino acid query sequence against a protein sequence database
blastn: compares a nucleotide query sequence against a nucleotide sequence database
blastx: compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database
tblastn: compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).
tblastx: compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. (Due to the nature of tblastx, gapped alignments are not available with this option)

We provide local access to nr and swissprot databases. Other databases are available upon request.

Batch Usage

A sample batch script is below:

#PBS -l nodes=1:ppn=1
#PBS -l walltime=10:00
#PBS -N Blast
#PBS -S /bin/bash
#PBS -j oe

module load blast
set -x

cd $PBS_O_WORKDIR
mkdir $PBS_JOBID

cp 100.fasta $TMPDIR
cd $TMPDIR
/usr/bin/time blastn -db nt -query 100.fasta  -out test.out

cp * $PBS_O_WORKDIR/$PBS_JOBID

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

BLAT

BLAT is a sequence analysis tool which performs rapid mRNA/DNA and cross-species protein alignments. BLAT is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.

BLAT is not BLAST. DNA BLAT works by keeping an index of the entire genome (but not the genome itself) in memory. Since the index takes up a bit less than a gigabyte of RAM, BLAT can deliver high performance on a reasonably priced Linux box. The index is used to find areas of probable homology, which are then loaded into memory for a detailed alignment. Protein BLAT works in a similar manner, except with 4-mers rather than 11-mers. The protein index takes a little more than 2 gigabytes.

Availability & Restrictions

BLAT is available without restriction to all OSC users.

The following versions of BLAT are available at OSC:

Version Glenn Oakley
34 X  

Usage

Set-up

To initalize the Glenn system prior to using BLAT, run the following commands:

module load biosoftw
module loat blat

Using BLAT

The main programs in the blat suite are:

gfServer – a server that maintains an index of the genome in memory and uses the index to quickly find regions with high levels of sequence similarity to a query sequence.
gfClient – a program that queries gfServer over the network, and then does a detailed alignment of the query sequence with regions found by gfServer.
blat –combines client and server into a single program, first building the index, then using the index, and then exiting. 
webBlat – a web based version of gfClient that presents the alignments in an interactive fashion. (not included on OSC server)

Building an index of the genome typically takes 10 or 15 minutes.  Typically for interactive applications one uses gfServer to build a whole genome index.  At that point gfClient or webBlat can align a single query within few seconds.  If one is aligning a lot of sequences in a batch mode then blat can be more efficient, particularly if run on a cluster of computers.  Each blat run is typically done against a single chromosome, but with a large number of query sequences.

Other programs in the blat suite are:

pslSort – combines and sorts the output of multiple blat runs.  (The blat default output format is .psl).
pslReps – selects the best alignments for a particular query sequence, using a ‘near best in genome’ approach.
pslPretty – converts alignments from the psl format, which is tab-delimited format and does not include the bases themselves, to a more readable alignment format.
faToTwoBit – convert Fasta format sequence files to a dense randomly-accessable  .2bit format that gfClient can use.
twoBitToFa – convert from the .2bit format back to fasta
faToNib – convert from Fasta to a somewhat less dense randomly accessible format that predates .2bit.  Note each .nib file can only contain a single sequence.
nibFrag – convert portions of a nib file back to fasta.

The command line options of each of the programs is described below. Similar summaries of usage are printed when a command is run with no arguments.

Batch Usage

A sample batch script is as below:

#PBS -N blat
#PBS -j oe
#PBS -l nodes=1:ppn=1
#PBS -S /bin/bash

cd $PBS_O_WORKDIR
blat -stepSize=5 -repMatch=2253 -minScore=0 -minIdentity=0 database.2bit query.fa output.psl 

 

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

Bowtie

"Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. It aligns 35-base-pair reads to the human genome at a rate of 25 million reads per hour on a typical workstation. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: for the human genome, the index is typically about 2.2 GB (for unpaired alignment) or 2.9 GB (for paired-end or colorspace alignment). Multiple processors can be used simultaneously to achieve greater alignment speed. Bowtie can also output alignments in the standard SAM format, allowing Bowtie to interoperate with other tools supporting SAM, including the SAMtools consensus, SNP, and indel callers. Bowtie runs on the command line." (http://bowtie-bio.sourceforge.net/manual.shtml)

Availability & Restrictions

Bowtie is available to all OSC users without restriction.

The following version of Bowtie are available on OSC systems:

Version Glenn Oakley
0.12.7 X  

Usage

Set-up

To configure your environment for using PAUP, run the following command:

module load paup

Using Bowtie

On the Glenn Cluster bowtie is accessed by executing the following commands:

module load biosoftw
module load bowtie

bowtie will be added to the users PATH and can then be run with the command:

bowtie [options]* <ebwt> {-1 <m1> -2 <m2 | --12 <r> | <s>} [<hit]

Below are definitions for some of the main optional arguments:

<m1>    Comma-separated list of files containing upstream mates (or the sequences themselves, if -c is set) paired with mates in <m2>
<m2>    Comma-separated list of files containing downstream mates (or the sequences themselves if -c is set) paired with mates in <m1>
<r>     Comma-separated list of files containing Crossbow-style reads.  Can be a mixture of paired and unpaired.  Specify "-" for stdin.
<s>     Comma-separated list of files containing unpaired reads, or the sequences themselves, if -c is set.  Specify "-" for stdin.
<hit>   File to write hits to (default: stdout)

Options

Input:

-q                       query input files are FASTQ .fq/.fastq (default)
-f                       query input files are (multi-)FASTA .fa/.mfa
-r                       query input files are raw one-sequence-per-line
-c                       query sequences given on cmd line (as <mates>, <singles>)
-C                       reads and index are in colorspace
-Q/--quals <file>        QV file(s) corresponding to CSFASTA inputs; use with -f -C
--Q1/--Q2 <file>         same as -Q, but for mate files 1 and 2 respectively
-s/--skip <int>          skip the first <int> reads/pairs in the input
-u/--qupto <int>         stop after first <int> reads/pairs (excl. skipped reads)
-5/--trim5 <int>         trim <int> bases from 5' (left) end of reads
-3/--trim3 <int>         trim <int> bases from 3' (right) end of reads
--phred33-quals          input quals are Phred+33 (default)
--phred64-quals          input quals are Phred+64 (same as --solexa1.3-quals)
--solexa-quals           input quals are from GA Pipeline ver. < 1.3
--solexa1.3-quals        input quals are from GA Pipeline ver. >= 1.3
--integer-quals          qualities are given as space-separated integers (not ASCII)

Alignment:

-v <int>                 report end-to-end hits w/ <=v mismatches; ignore qualities or
-n/--seedmms <int>       max mismatches in seed (can be 0-3, default: -n 2)
-e/--maqerr <int>        max sum of mismatch quals across alignment for -n (def: 70)
-l/--seedlen <int>       seed length for -n (default: 28)
--nomaqround             disable Maq-like quality rounding for -n (nearest 10 <= 30)
-I/--minins <int>        minimum insert size for paired-end alignment (default: 0)
-X/--maxins <int>        maximum insert size for paired-end alignment (default: 250)
--fr/--rf/--ff           -1, -2 mates align fw/rev, rev/fw, fw/fw (default: --fr)
--nofw/--norc            do not align to forward/reverse-complement reference strand
--maxbts <int>           max # backtracks for -n 2/3 (default: 125, 800 for --best)
--pairtries <int>        max # attempts to find mate for anchor hit (default: 100)
-y/--tryhard             try hard to find valid alignments, at the expense of speed
--chunkmbs <int>         max megabytes of RAM for best-first search frames (def: 64)

Reporting:

-k <int>                 report up to <int> good alignments per read (default: 1)
-a/--all                 report all alignments per read (much slower than low -k)
-m <int>                 suppress all alignments if > <int> exist (def: no limit)
-M <int>                 like -m, but reports 1 random hit (MAPQ=0); requires --best
--best                   hits guaranteed best stratum; ties broken by quality
--strata                 hits in sub-optimal strata aren't reported (requires --best)

Output:

-t/--time                print wall-clock time taken by search phases
-B/--offbase <int>       leftmost ref offset = <int> in bowtie output (default: 0)
--quiet                  print nothing but the alignments
--refout                 write alignments to files refXXXXX.map, 1 map per reference
--refidx                 refer to ref. seqs by 0-based index rather than name
--al <fname>             write aligned reads/pairs to file(s) <fname>
--un <fname>             write unaligned reads/pairs to file(s) <fname>
--max <fname>            write reads/pairs over -m limit to file(s) <fname>
--suppress <cols>        suppresses given columns (comma-delim'ed) in default output
--fullref                write entire ref name (default: only up to 1st space)

Colorspace:

--snpphred <int>         Phred penalty for SNP when decoding colorspace (def: 30) or
--snpfrac <dec>          approx. fraction of SNP bases (e.g. 0.001); sets --snpphred
--col-cseq               print aligned colorspace seqs as colors, not decoded bases
--col-cqual              print original colorspace quals, not decoded quals
--col-keepends           keep nucleotides at extreme ends of decoded alignment

SAM:

-S/--sam                 write hits in SAM format
--mapq <int>             default mapping quality (MAPQ) to print for SAM alignments
--sam-nohead             supppress header lines (starting with @) for SAM output
--sam-nosq               supppress @SQ header lines for SAM output
--sam-RG <text>          add <text> (usually "lab=value") to @RG line of SAM header

Performance:

-o/--offrate <int>       override offrate of index; must be >= index's offrate
-p/--threads <int>       number of alignment threads to launch (default: 1)
--mm                     use memory-mapped I/O for index; many 'bowtie's can share
--shmem                  use shared mem for index; many 'bowtie's can share

Other:

--seed <int>             seed for random number generator
--verbose                verbose output (for debugging)
--version                print version information and quit
-h/--help                print this usage message

 

Batch Usage

The following is an example batch script file.

#PBS -n bowtie_test
#PBS -l walltime=00:05:00
#PBS -l nodes=1:ppn=4

module load biosoftw
module load bowtie-0.12.7
cd $PBS_O_WORKDIR
cp /usr/local/biosoftw/bowtie-$BOWTIE_VERSION/genomes/NC_008253.fna .
bowtie-build NC_008253.fna e_coli
bowtie –p 4 e_coli -c ATGCATCATGCGCCAT

Errors

The following scripts fail due to an ftp error: make_e_coli.sh, make_a_thaliana_tair.sh, and make_c_elegans_ws200.sh.  The following scripts fail to obtain all of the fasta format files prior to bowtie conversion and fail: make_galGal3.sh, make_hg18.sh, make_h_sapiens_ncbi36.sh, make_h_sapiens_ncbi37.sh, make_mm9.sh, make_m_musculus_ncbi37.sh.  The follow script does not work properly on the Glenn Cluster: gen_dnamasks2colormask.pl.

Further Reading

Supercomputer: 
Service: 
Fields of Science: 

Cambridge Structural Database

Introduction

The Cambridge Structural Database (CSD) contains complete structure information on hundreds of thousands of small molecule crystals. The Cambridge Crystallographic Data Centre (CCDC) is a suite of programs for CSD search and analysis. The Cambridge Crystallographic Data Centre Home Page has additional information.

Version

Several versions are available at OSC. Search for csd in this list:

module avail

Availability

CSD is available on cambridge.osc.edu (opt-login02.osc.edu).

Usage

Users must complete and sign a license agreement and fax it back to OSC.

To run conquest:

module load csd
cq

Documentation

General documentation is available at the CCDC Home page and in the local machine directories.

Supercomputer: 
Service: 

COMSOL

COMSOL Multiphysics (formerly FEMLAB) is a finite element analysis and solver software package for various physics and engineering applications, especially coupled phenomena, or multiphysics. owned and supported by COMSOL, Inc.

Availability & Compatibility

COMSOL is available on the Oakley and Glenn clusters.

The versions currently available at OSC are

Version GLENN OAKLEY
3.4 X  
3.5a X  
4.0 X  
4.0a X  
4.1 X  
4.2 X  
4.2a X X
4.3 X X
4.3a   X

Usage

Access

COMSOL is for academic use only.  To use COMSOL you will have to be added to the license server first.  Please contact OSC Help to be added.

Setup

Use module avail to view available modules for a given machine. To select a particular software version, type: module load software-name.
For example: To select COMSOL version 4.2a on Oakley, type: module load comsol/42a

Batch Usage

Sample Batch Script (single processor analysis)

When you log into oakley.osc.edu or glenn.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the 4000+ processors in the computing environment, you must submit your COMSOL analysis to the batch system for execution. For example, assume that you have had a comsol script file mycomsol.m in the direcory $PBS_O_WORKDIR. In this directory you should create a batch script that would contain the following information:

#PBS -N COMSOL
#PBS -l walltime=1:00:00
#PBS -l nodes=1:ppn=1
#PBS -j oe
#
# The following lines set up the COMSOL environment
#
module load comsol
#
# Move to the directory where the job was submitted
#
cd $PBS_O_WORKDIR
cp *.m $TMPDIR
cd $TMPDIR
#
# Run COMSOL
#
comsol batch mycomsol
#
# Now, copy data (or move) back once the simulation has completed
#
cp * $PBS_O_WORKDIR

Sample Batch Script (Parallel) for COMSOL 4.2a and earlier

The following is a sample batch script which can be submitted for running COMSOL software in Parallel

  • Set nodes to 2 and ppn to 12
  • Copy files from your directory to $TMPDIR
  • Run mpdboot
  • Provide the name of the input file and output file.
  • Set ppn=8 on Glenn
  • Set ppn=12 on Oakley
#PBS -l walltime=01:00:00
#PBS -l nodes=2:ppn=12
#PBS -N COMSOL
#PBS -j oe
#PBS -r n

cd ${PBS_O_WORKDIR}

module load comsol

echo "--- Copy Input Files to TMPDIR and Change Disk to TMPDIR"
pbsdcp *.m* $TMPDIR
cd $TMPDIR

np=12
echo "--- Running on ${np} processes (cores) on the following nodes:"
cat $PBS_NODEFILE | uniq

echo "---- mpd BOOT"
comsol -nn 2 mpd boot -f $PBS_NODEFILE

echo "--- mpd TRACE"
comsol mpd trace

echo "--- COMSOL run"
comsol -nn 2 -np ${np} batch -inputfile input_cluster.mph -outputfile output_cluster.mph

echo "--- mpd ALLEXIT"
comsol mpd allexit

echo "--- Copy files back"
pbsdcp *.m* ${PBS_O_WORKDIR}

echo "---Job finished at: 'date'"
echo "---------------------------------------------" 

Sample Batch Script (Parallel) for COMSOL 4.3

As of version 4.3, it is not necessary to start up MPD before launching a COMSOL job.The following is a sample batch script which can be submitted for running COMSOL 4.3 software in Parallel

  • Set nodes to 2 and ppn to 12
  • Copy files from your directory to $TMPDIR
  • Run mpdboot
  • Provide the name of the input file and output file.
  • Set ppn=8 on Glenn
  • Set ppn=12 on Oakley
#PBS -l walltime=01:00:00
#PBS -l nodes=2:ppn=12
#PBS -N COMSOL
#PBS -j oe
#PBS -r n

cd ${PBS_O_WORKDIR}

module load comsol

echo "--- Copy Input Files to TMPDIR and Change Disk to TMPDIR"
pbsdcp *.m* $TMPDIR
cd $TMPDIR

np=12
echo "--- Running on ${np} processes (cores) on the following nodes:"
cat $PBS_NODEFILE | uniq > hostfile

echo "--- COMSOL run"
comsol -nn 2 -np ${np} batch -f hostfile -mpirsh rsh -inputfile input_cluster.mph -outputfile output_cluster.mph

echo "--- Copy files back"
pbsdcp *.m* ${PBS_O_WORKDIR}

echo "---Job finished at: 'date'"
echo "---------------------------------------------"

Documentation

Online documentation is available at: http://www.osc.edu/supercomputing/manuals.

See Also

 

Supercomputer: 
Service: 

CFX

CFX is a computational fluid dynamics (CFD) program for modeling fluid flow and heat transfer in a variety of applications.

Availability & Restri<--break->ctions

CFX is available on both the Glenn and Oakley clusters.  The following versions are available:

VERSION GLENN OAKLEY
13 X
14   X
14.5 X  
14.5.7   X

Academic License Limitations

Currently, we do not support CFX for academic users.

Commercial License Limitations

For commercial users, there are in total 20 base license tokens and 512 HPC tokens. The base license tokens are shared betweeen FLUENT and CFX. The HPC tokens are shared among available ANSYS products (FLUENT, CFX, ICEMCFD, ANSYS Mechanical, etc.)  

Usage

Access

Use of CFX requires validation. Please contact OSC Help for more information.

Set-up

CFX can only be run on the compute nodes of the Oakley and Glenn clusters. Therefore, all CFX jobs are run via the batch scheduling system, either as interactive or unattended jobs. In either case, only once a batch job has been started can the CFX module be loaded. The command format that loads CFX on your system is different from Glenn to Oakley cluster.

For example, if you'd like to load CFX version 14.5 on Glenn, type:

module load fluent14.5_nimbis

cfx5

If you'd like to load CFX version 14 on Oakley, type:

module load fluent/14-nimbis

cfx5

For a list of available CFX versions and the format expected, type:

module avail fluent

Batch Usage

Sample Usage (interactive execution)

Using the CFX GUI interactivly can be done with the following steps:

  1. Ensure that your SSH client software has X11 forwarding enabled
  2. Connect to either the Oakley or Glenn system
  3. Request an interactive job. The command below will request a one-core, one-hour job. Modify as per your own needs:
    qsub -I -X -l walltime=1:00:00 -l software=cfdnd+1
  4. Once the interactive job has started, run the following commands to setup and start the CFX GUI (Here, CFX version 14 on Oakley is launched):

    module load fluent/14-nimbis
    cfx5

Sample Batch Script (serial execution using 1 base token)

An example of running CFX job for one-hour with an input file named "test.def" on Oakley is provided below:

#PBS -N serialjob_cfx
#PBS -l walltime=1:00:00
#PBS -l software=cfdnd+1
#PBS -l nodes=1:ppn=12
#PBS -j oe
#PBS -S /bin/bash

#Set up CFX environment.
#Here, module "fluent/14.5.7-nimbis" is used as an example
module load fluent/14.5.7-nimbis

#'cd' directly to your working directory
cd $PBS_O_WORKDIR

#Copy CFX files like .def to $TMPDIR and move there to execute the program
cp test.def $TMPDIR/
cd $TMPDIR

#Run CFX in serial with test.def as input file
cfx5solve -batch -def test.def 

#Finally, copy files back to your home directory
cp  * $PBS_O_WORKDIR

Sample Batch Script (parallel execution using HPC token)

CFX can be run in parallel, but it is very important that you read the documentation in the CFX Manual on the details of how this works. You can find the CFX manuals on-line by following the "Further Reading" link at the bottom of this page.

An example of the batch script follows:

#PBS -N paralleljob_cfx
#PBS -l walltime=10:00:00
#PBS -l nodes=2:ppn=12
#PBS -l software=cfdnd+1
#PBS -j oe
#PBS -S /bin/bash

#Set up CFX environment.
#Here, module "fluent/14.5.7-nimbis" is used as an example
module load fluent/14.5.7-nimbis

#'cd' directly to your working directory
cd $PBS_O_WORKDIR

#Copy CFX files like .def to $TMPDIR and move there to execute the program
cp test.def $TMPDIR/
cd $TMPDIR

#Convert PBS_NODEFILE information into format for CFX host list
nodes=`cat $PBS_NODEFILE`
nodes=`echo $nodes | sed -e 's/ /,/g'`

#Run CFX in parallel with new.def as input file
cfx5solve -batch -def test.def  -par-dist $nodes -start-method "HP MPI Distributed Parallel"

#Finally, copy files back to your home directory
cp  * $PBS_O_WORKDIR

Further Reading

Supercomputer: 
Service: 
Fields of Science: 

CUDA

CUDA™ (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by Nvidia that enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Availability and Compatability

CUDA is available on Oakley and Glenn Clusters. The versions currently available at OSC are

Version Glenn Oakley
2.3 X  
3.0 X  
3.1 X  
4.0 X  
4.1.28   X
4.2.9   X
5.0.35   X
5.5 X X

Usage

Access

CUDA is available for use by all OSC users.

Setup

Use module avail to view available modules for a given machine. To load the appropriate CUDA module, type: module load software-name.
For example: To select CUDA version 4.1.28 on Oakley, type: module load cuda/4.1.28

GPU Computing SDK

The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute. On Oakley, the SDK has been installed in $CUDA_HOME (an environment variable set when you load the module).

Programming in CUDA

Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on Optimizing CUDA codes to obtain greater SpeedUp. One can also refer to the following webpage for some more CUDA optimization techniques, http://www.cs.berkeley.edu/~volkov/

Compiling CUDA Code

One can type module show cuda/version-number to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu, the following could be done after loading the appropriate CUDA module:
nvcc -o mycudaApp mycudaApp.cu
This will create an executable by name mycudaApp

Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.

Debugging CUDA code

cuda-gdb can be used to debug CUDA codes. module load cuda will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.

Detecting memory access errors

CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.

Batch Usage

Following are the sample batch scripts for requesting GPU nodes on Glenn and Oakley. Notice that only the second line is different in the two batch scripts. In case of Oakley one can specify the number of GPUs required.

Sample Batch Script (Glenn)

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=8:gpu
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

Sample Batch Script (Oakley)

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1:gpus=1
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

For an interactive batch session one can run the following command:
On Glenn
qsub -I -l nodes=1:ppn=8:gpu -l walltime=00:20:00

On Oakley
qsub -I -l nodes=1:ppn=1:gpus=1 -l walltime=00:20:00

Please note that on Oakley, you can request any mix of ppn and gpus you need; please see the Job Scripts page in our batch guide for more information.

Further Reading

Online documentation is available at http://developer.nvidia.com/nvidia-gpu-computing-documentation

See Also

Supercomputer: 
Service: 
Technologies: 
Fields of Science: 

Clustal W

Clustal W is a general purpose multiple sequence alignment program for DNA or proteins.It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.

Availability & Restrictions

Clustal W is available without restriction to all OSC users.

The following versions of Clustal W are available on OSC systems:

Version Glenn Oakley
1.8.3 X  
2.1   X

Usage

Set-up

Setup prior to use of Clustal W is dependent on the system you are using. On the Glenn system, first load the biosoftware module, then load the clustalw module:

module load biosoftw
module load clustalw

On the Oakley system, just load the clustalw module directly:

module load clustalw

 

Using Clustal W

Once the clustalw module has been loaded, the commands are available for your use. On the Glenn system, the command is

clustalw

On the Oakley system, the command is

clustalw2

The options can be listed interactively by typing clustalw -help or clustalw -check on the command-line.

                DATA (sequences)
-INFILE=file.ext                             :input sequences.
-PROFILE1=file.ext  and  -PROFILE2=file.ext  :profiles (old alignment).

                VERBS (do things)
-OPTIONS            :list the command line parameters
-HELP  or -CHECK    :outline the command line params.
-ALIGN              :do full multiple alignment 
-TREE               :calculate NJ tree.
-BOOTSTRAP(=n)      :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).
-CONVERT            :output the input sequences in a different file format.

                PARAMETERS (set things)
***General settings:****
-INTERACTIVE :read command line, then enter normal interactive menus
-QUICKTREE   :use FAST algorithm for the alignment guide tree
-TYPE=       :PROTEIN or DNA sequences
-NEGATIVE    :protein alignment with negative values in matrix
-OUTFILE=    :sequence alignment file name
-OUTPUT=     :GCG, GDE, PHYLIP, PIR or NEXUS
-OUTORDER=   :INPUT or ALIGNED
-CASE        :LOWER or UPPER (for GDE output only)
-SEQNOS=     :OFF or ON (for Clustal output only)
-SEQNO_RANGE=:OFF or ON (NEW: for all output formats) 
-RANGE=m,n   :sequence range to write starting m to m+n. 

***Fast Pairwise Alignments:***
-KTUPLE=n    :word size
-TOPDIAGS=n  :number of best diags.
-WINDOW=n    :window around best diags.
-PAIRGAP=n   :gap penalty
-SCORE       :PERCENT or ABSOLUTE

***Slow Pairwise Alignments:***
-PWMATRIX=    :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
-PWGAPOPEN=f  :gap opening penalty        
-PWGAPEXT=f   :gap opening penalty

***Multiple Alignments:***
-NEWTREE=      :file for new guide tree
-USETREE=      :file for old guide tree
-MATRIX=       :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-DNAMATRIX=    :DNA weight matrix=IUB, CLUSTALW or filename
-GAPOPEN=f     :gap opening penalty        
-GAPEXT=f      :gap extension penalty
-ENDGAPS       :no end gap separation pen. 
-GAPDIST=n     :gap separation pen. range
-NOPGAP        :residue-specific gaps off  
-NOHGAP        :hydrophilic gaps off
-HGAPRESIDUES= :list hydrophilic res.    
-MAXDIV=n      :% ident. for delay
-TYPE=         :PROTEIN or DNA
-TRANSWEIGHT=f :transitions weighting

***Profile Alignments:***
-PROFILE      :Merge two alignments by profile alignment
-NEWTREE1=    :file for new guide tree for profile1
-NEWTREE2=    :file for new guide tree for profile2
-USETREE1=    :file for old guide tree for profile1
-USETREE2=    :file for old guide tree for profile2

***Sequence to Profile Alignments:***
-SEQUENCES   :Sequentially add profile2 sequences to profile1 alignment
-NEWTREE=    :file for new guide tree
-USETREE=    :file for old guide tree

***Structure Alignments:***
-NOSECSTR1     :do not use secondary structure-gap penalty mask for profile 1 
-NOSECSTR2     :do not use secondary structure-gap penalty mask for profile 2
-SECSTROUT=STRUCTURE or MASK or BOTH or NONE   :output in alignment file
-HELIXGAP=n    :gap penalty for helix core residues 
-STRANDGAP=n   :gap penalty for strand core residues
-LOOPGAP=n     :gap penalty for loop regions
-TERMINALGAP=n :gap penalty for structure termini
-HELIXENDIN=n  :number of residues inside helix to be treated as terminal
-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
-STRANDENDIN=n :number of residues inside strand to be treated as terminal
-STRANDENDOUT=n:number of residues outside strand to be treated as terminal 

***Trees:***
-OUTPUTTREE=nj OR phylip OR dist OR nexus
-SEED=n        :seed number for bootstraps.
-KIMURA        :use Kimura's correction.   
-TOSSGAPS      :ignore positions with gaps.
-BOOTLABELS=node OR branch :position of bootstrap values in tree display

Batch Usage

Sample batch script for the Oakley system:

#PBS -N clustalw
#PBS -l walltime=1:00:00
#PBS -l nodes=1:ppn=1
#PBS -j oe

cd $PBS_O_WORKDIR
module load clustalw
clustalw2 -INFILE=myfile.seqs -GAPOPEN=2 -GAPEXT=4

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

EMBOSS

EMBOSS is "The European Molecular Biology Open Software Suite". EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole.

Within EMBOSS you will find around hundreds of programs (applications) covering areas such as:

  • Sequence alignment,
  • Rapid database searching with sequence patterns,
  • Protein motif identification, including domain analysis,
  • Nucleotide sequence pattern analysis---for example to identify CpG islands or repeats,
  • Codon usage analysis for small genomes,
  • Rapid identification of sequence patterns in large scale sequence sets,
  • Presentation tools for publication

Availability & Restrictions

The EMBOSS software package is available without restriction for all academic OSC users.

The following versions of EMBOSS are available on OSC systems:

Version Glenn Oakley
5.0.0 X  
6.4.0   X

Usage

Set-up

Setup of EMBOSS depends on what system you are using. For the Glenn system, run the following commands:

module load biosoftw
module load emboss

For the Oakly system, run the following command:

module load emboss

Using EMBOSS

The EMBOSS programs are typically used to perform one or more tasks on a large number of sequences.  Once the emboss module is loaded, all emboss commands are available for your use.

Batch Usage

Coming soon.

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data. It is portable and performs well on a wide variety of platforms.

Availability & Restrictions

FFTW is available without restriction to all OSC Users

The following versions of FFTW are available on OSC systems:

Version Glenn Oakley
2.1   X**
2.1.5 X  
3.1.2 X  
3.3 X* X**
3.3.1 X*  
3.3.2 X**  

*gnu versions only, **intel version only

Usage

Set-up

Initalizing the system for use of the FFTW library is dependent on both the system you are using and the compiler you are using. Some of the installations on Glenn also depend on the version of MPI you are using. A successful build of your program will depend on an understanding of what module fits your circumstances. To load the default FFTW3 library, run the following command:

module load fftw3

Building With FFTW

The following environment variables are setup when the FFTW library is loaded:

Variable Use
$FFTW3_CFLAGS Use during your compilation step for C programs on Oakley.
$FFTW3_FFLAGS Use during your compilation step for Fortran programson Oakley.
$FFTW3_LIBS Use during your link step on Oakley for the sequential version of the library.
$FFTW3_LIBS_OMP Use during your link step on Oakley for the OpenMP version of the library.
$FFTW3_LIBS_MPI Use during your link step on Oakleyfor the MPI version of the library.
$FFTW_CFLAGS Use during your compilation step for C programs on Glenn.
$FFTW_FFLAGS Use during your compilation step for Fortran programs on Glenn.
$FFTW_LIBS Use during your link step on Glenn for the sequential version of the library.
$FFTW_MPI_LIBS Use during your link step on Glenn for the MPI version of the library.

below is a set of example commands used to build a file called my-fftw.c on Oakley.

module load fftw3
icc $FFTW3_CFLAGS my-fftw.c -o my-fftw $FFTW3_LIBS 
ifort $FFTW3_FFLAGS more-fftw.f -o more-fftw $FFTW3_LIBS

Further Reading

See Also

Service: 

FLUENT

FLUENT is a state-of-the-art computer program for modeling fluid flow and heat transfer in complex geometries.

<--break->

Availability & Restrictions

FLUENT is available on both the Glenn and Oakley clusters.  The following versions are available:

Version Glenn Oakley
13 X X
14   X
14.5 X  

A base license token will allow Fluent to use up to 4 cores without any additional tokens. If you want to use more than 4 cores, you will need an additional "HPC" token per core. For instance, a serial FLUENT job with 1 core will need 1 base license token while a parallel FLUENT job with 8 cores will need 1 base license token and 4 HPC tokens.

Academic License Limitations

Currently, there are in total 25 base license tokens and 68 HPC tokens for academic users. These HPC tokens are shared between FLUENT and ANSYS. A job using a base license token can be submitted to either Glenn or Oakley clusters. A parallel job using HPC tokens (with "ansyspar" flag) however can only be submitted to Glenn clusters due to scheduler issue.

NDEMC License Limitations

For NDEMC users, there are in total 10 base license tokens and 512 HPC tokens. These HPC tokens are shared among available ANSYS products (FLUENT, CFX, ICEMCFD, ANSYS Mechanical, etc.) All jobs should be submitted to Oakley cluster. 

Usage

Access

Use of FLUENT for academic purposes requires validation. To obtain validation please complete and return the "Academic Agreement to Use FLUENT." This can be obtained from your site consultant or from the files fluent.pdf located in the Academic Agreement Forms.

Set-up

FLUENT can only be run on the compute nodes of the Oakley and Glenn clusters. Therefore, all FLUENT jobs are run via the batch scheduling system, either as interactive or unattended jobs. In either case, only once a batch job has been started can the FLUENT module be loaded with the following command:

module load fluent

Batch Usage

Sample Usage (interactive execution)

Using the FLUENT GUI interactivly can be done with the following steps:

  1. Ensure that your SSH client software has X11 forwarding enabled
  2. Connect to either the Oakley or Glenn system
  3. Request an interactive job. The command below will request a single-core, one hour job. Modify as per your own needs:
    qsub -I -X -l walltime=1:00:00 -l software=fluent+1
  4. Once the interactive job has started, run the following commands to setup and start the FLUENT GUI:

    module load fluent
    fluent

Sample Batch Script (serial execution using 1 base token)

Running FLUENT for five-hours on an input file "run.input" on either Glenn or Oakley:

#PBS -N serial_fluent
#PBS -l walltime=5:00:00 
#PBS -l nodes=1:ppn=1
#PBS -l software=fluent+1
#PBS -j oe
#
# The following lines set up the FLUENT environment
#
module load fluent
#
# Move to the directory where the job was submitted from
# You could also 'cd' directly to your working directory
cd $PBS_O_WORKDIR
#
# Copy files to $TMPDIR and move there to execute the program
#
cp test_input_file.cas test_input_file.dat run.input $TMPDIR
cd $TMPDIR
#
# Run fluent
fluent 3d -g < run.input  
#
# Where the file 'run.input' contains the commands you would normally
# type in at the Fluent command prompt.
# Finally, copy files back to your home directory
cp *   $PBS_O_WORKDIR  

As an example, your run.input file might contain:

========================================  
file/read-case-date test_input_file.cas 
solve/iterate 100
file/write-case-data test_result.cas
file/confirm-overwrite yes    

exit  
yes  
========================================

Sample Batch Script (parallel execution using HPC token)

FLUENT can be run in parallel, but it is very important that you read the documentation in the FLUENT Manual on the details of how this works. You can find the Fluent manuals on-line by following the "Further Reading" link at the bottom of this page, or clicking the "Manuals" link in the left panel of any of the software pages.

In addition to requesting the FLUENT base license token (-l software=fluent+1), you need to request copies of the ansyspar license, i.e., HPC tokens. However the scheduler cannot handle two "software" flags simultaneously, so the syntax changes. The new option is -W x=GRES:fluent+1%ansyspar+[n], where [n] is equal to the number of cores you requested minus 4.

Parallel jobs have to be submitted on Glenn via the batch system. An example of the batch script follows:

========================================  
#PBS -N parallel_fluent   
#PBS -l walltime=1:00:00   
#PBS -l nodes=2:ppn=8
#PBS -j oe
#PBS -W x=GRES:fluent+1%ansyspar+12
#PBS -S /bin/bash
set echo on   
hostname   
#   
# The following lines set up the FLUENT environment   
#   
module load fluent
#   
# Move to the directory where the job was submitted from and   
# create the config file for socket communication library   
#   
cd $PBS_O_WORKDIR   
#   
# Create list of nodes to launch job on   
rm -f pnodes   
cat  $PBS_NODEFILE | sort > pnodes   
export ncpus=`cat pnodes | wc -l`   
#   
#   Run fluent   
fluent 3d -t$ncpus -pinfiniband.ofed -cnf=pnodes -g < run.input 

Further Reading

See Also

  • The FLUENT users ARMSTRONG group
Supercomputer: 
Service: 

Fitmodel

"'Fitmodel' estimates the parameters of various codon-based models of substitution, including those described in Guindon, Rodrigo, Dyer and Huelsenbeck (2004).  These models are especially useful as they accommodate site-specific switches between selection regimes without a priori knowledge of the positions in the tree where changes of selection regimes occurred.

The program will ask for two input files: a tree file and a sequence file.  The tree should be unrooted and in NEWICK format.  The sequences should be in PHYLIP interleaved or sequential format.  If you are planning to use codon-based models, the sequence length should be a multiple of 3.  The program provides four types of codon models: M1, M2, M2a, and M3 (see PAML manual).  Moreover, M2, M2a and M3 can be combined with 'switching' models (option 'M').  Two switching models are implemented: S1 and S2.  S1 constraints the rates of changes between dN/dS values to be uniform (e.g., the rates of changes between negative and positive selection is constrained to be the same as the rate of change between neutrality and positive selection) while S2 allows for differents rates of change between the different classes of dN/dS values.

If you are using a 'switching' model, 'fitmodel' will output file with the following names: your_sequence_file_trees_w1, your_sequence_file_trees_w2, your_sequence_file_trees_w3 and your_sequence_file_trees_wbest.  The w1, w2 and w3 files give the estimated tree with probabilities of w1, w2, and w3 (three maximum likelihood dN/dS ratio estimates) calculated on each edge of the tree and for each site.  Hence, the first tree in one of these files reports the probabilities calculated at the first site of the alignment.  Instead of probabilities, the wbest file allows you to identify which of the tree dN/dS is the most probable on any give edge, at any given site.  A branch with label 0.0 means that w1 is the most probable class, 0.5 indicates the w2 is the most probable and 1.0 means that w3 has the highest posterior probability." (README.txt)

Availability & Restrictions

Fitmodel is available to all OSC users without restriction.

The following versions of fitmodel are available on OSC systems:

Version Glenn Oakley
0.5.3 X  

Usage

Set-up

On the Glenn Cluster fitmodel is accessed by executing the following commands:

module load biosoftw
module load fitmodel

Using fitmodel

fitmodel will be added to the users PATH and can then be run with the following command:

fitmodel -treefile treefilename -seqfile seqfilename [options]

Options

-type nt or aa (default=nt)
-freq empirical or ml or uniform or F3X4 (defaults=empirical or F3X4)
-codon no or yes (defaults=no)
-model JC69, K80, F81, HKY85, F84, TN93, GTR, Dayhoff, JTT, MtREV, WAG, DCMut, M2 or M3 (default=HKY85)
-pinvar [0.0;1.0]
-optpinvar no or yes (default=no)
-kappa [0.01;100.0]
-optkappa no or yes (default=no)
-ncatg integer > 0
-alpha [0.01;100.0]
-optalpha no or yes (default=no)
-code 1,2,3,4,5,6,9,10,11,12,13,14,15,16,21,22,23 (see NCBI Taxonomy webpage) (default=yes)
-p1 [0.0;1.0]
-p2 [0.0;1.0]
-p3 [0.0;1.0]
-w1 [1E-7;1E+7]
-w2 [1E-7;1E+7]
-w3 [1E-7;1E+7]
-switches no or S1 or S2 (default=no)
-optpw yes or no (default=yes)
-multiple integer > 0
-interleaved yes or no (default=yes)
-optall yes or no (default=yes)

Batch Usage

Modified PAML's example brown.trees & brown.nuc files to be in NEWICK & PHYLIP formats respectively.

#PBS -N fitmodel_test
#PBS -l walltime=00:05:00
#PBS -l nodes=1:ppn=4
#PBS -j oe

module load biosoftw
module load fitmodel
cd $PBS_O_WORKDIR
echo "y" | fitmodel -treefile brown.newick -seqfilename brown.phylip

Further Reading

Supercomputer: 
Service: 

GPU-Blast

GPU-BLAST is an accelerated version of the popular NCBI-BLAST (www.ncbi.nlm.nih.gov) that uses general-purpose graphics processing unit (GPU). In comparison to the sequential NCBI-BLAST, GPU-BLAST is nearly four times faster, while producing identical results.

Availability & Restrictions

GPU-BLAST is available without restriction to all OSC users.

The following versions of GPU-BLAST are available on OSC systems:

VERSION NCBI-Blast Version GLENN OAKLEY
1.0 2.2.24+   X
1.1 2.2.25+   X
1.1 2.2.26+   X

Usage

Set-up

To set up the environment for gpu-blast on Oakley cluster, do the following:

module load cuda/4.1.28
module load gpu-blast

Batch Usage

A sample batch script is below:

#PBS -l nodes=1:ppn=1:gpus=1
#PBS -l walltime=10:00
#PBS -N GPU-Blast
#PBS -S /bin/bash
#PBS -j oe

module load blast
module load cuda/4.1.28
set -x

cd $PBS_O_WORKDIR
mkdir $PBS_JOBID

cp 100.fasta $TMPDIR
cd $TMPDIR
/usr/bin/time blastn -db nt -query 100.fasta  -out test.out

cp * $PBS_O_WORKDIR/$PBS_JOBID

Further Reading

See Also

GROMACS

GROMACS is a versatile package of molecular dynamics simulation programs. It is primarily designed for biochemical molecules, but it has also been used on non-biological systems.  GROMACS generally scales well on OSC platforms. Versions after 4.6 include GPU acceleration.

Availability & Restrictions

GROMACS is available to all OSC users without restriction.

The following versions of GROMACS are available on OSC systems:

Version Glenn Oakley
3.3.1 X  
3.3.3 X  
4.0.3 X  
4.5.4 X  
4.5.5 X X
4.6.3   X

Usage

Set-up

Initalizing GROMACS on both Glenn and Oakley is done by loading a gromacs module:

module load gromacs

To see other available versions, run the following command:

module avail

Using GROMACS

To execute a serial GROMACS program interactively, simply run it on the command line, e.g.:

pdb2gmx

Parallel GROMACS programs should be run in a batch environment with mpiexec, e.g.:

mpiexec mdrun_mpi_d

Note that '_mpi' indicates a parallel executable and '_d' indicates a program built with double precision.

Batch Usage

Sample batch scripts and GROMACS input files are available here:

/nfs/10/srb/workshops/compchem/gromacs/

This simple batch script for Oakley demonstrates some important points:

# GROMACS Tutorial for Solvation Study of Spider Toxin Peptide
# see fwspider_tutor.pdf
#PBS -N fwsinvacuo.oakley
#PBS -l nodes=2:ppn=12
module load gromacs
# PBS_O_WORKDIR refers to the directory from which the job was submitted.
cd $PBS_O_WORKDIR
pbsdcp -p 1OMB.pdb em.mdp $TMPDIR
# Use TMPDIR for best performance.
cd $TMPDIR
pdb2gmx -ignh -ff gromos43a1 -f 1OMB.pdb -o fws.gro -p fws.top -water none
editconf -f fws.gro -d 0.7
editconf -f out.gro -o fws_ctr.gro -center 2.0715 1.6745 1.914
grompp -f em.mdp -c fws_ctr.gro -p fws.top -o fws_em.tpr
mpiexec mdrun_mpi -s fws_em.tpr -o fws_em.trr -c fws_ctr.gro -g em.log -e em.edr
cat em.log
cp -p * $PBS_O_WORKDIR/

Further Reading

 

Supercomputer: 
Service: 

Gaussian

Gaussian is the most popular general purpose electronic structure program. Its latest version, g09, can perform density functional theory, Hartree-Fock, Möller-Plesset, coupled-cluster, and configuration interaction calculations among others. Geometry optimizations, vibrational frequencies, magnetic properties, and solution modeling are available. It performs well as black-box software on closed-shell ground state systems.

Availability & Restrictions

Gaussian is available to all users that sign a software license agreement, found here, and return the signed form to OSC via fax.

The following versions of Gaussian are available on OSC systems:

Version Glenn Oakley
g03d01 X*  
g03e01 X*  
g09a01 X  
g09b01 X          X
g09c01 X

X

g09d01          X           X

 

* Users are highly encouraged to switch to the g09 version as g03 is no longer supported by Gaussian and has significant limitations on Glenn.

Usage

Set-up

To initalize your environement for use of Gaussian 09, run the following command on Glenn:

module load g09

On Oakley, the command is

module load gaussian

To see other available versions, run the following command:

module avail

Using Gaussian

To execute Gaussian, simply run the Gaussian binary with the input file on the command line:

g09 < input.com

When the input file is redirected as above ( < ), the output will be standard output; in this form the output can be seen via 'qpeek jobid' when the job is running in a batch queue.  Alternatively, Gaussian can be invoked without file redirection:

g09 input.com

in which case the output file will be named 'input.log'; in this form the output cannot be seen via 'qpeek jobid' when the job is running in a batch queue.

Batch Usage

Sample batch scripts and Gaussian input files are available here:

/nfs/10/srb/workshops/compchem/gaussian/

This simple batch script demonstrates the important points:

#PBS -N GaussianJob
#PBS -l nodes=1:ppn=1

# PBS_O_WORKDIR refers to the directory from which the job was submitted.
cd $PBS_O_WORKDIR
cp input.com $TMPDIR
# Use TMPDIR for best performance.
cd $TMPDIR

module load g09
g09 < input.com
cp -p input.log *.chk $PBS_O_WORKDIR

Note: OSC does not have a functional distributed parallel version (LINDA) of Gaussian. Parallelism of Gaussian at OSC is only via shared memory. Consequently, do not request more than one node for Gaussian jobs on OSC's clusters.

Further Reading

 

Supercomputer: 
Service: 

GLPK

GLPK (GNU Linear Programming Kit) is a set of open source LP (linear programming) and MIP (mixed integer problem) routines written in ANSI C, which can be called from within C programs. 

Availability & Restrictions

GLPK is available to all OSC users without restriction. 

The following versions are available on OSC systems:

version glenn oakley
4.48   X

Usage

Setup

To set up your environment for using GLPK on Oakley, run the following command:

module load glpk

Compiling and Linking

To compile your C code using GLPK API routines, use the environment variable $GLPK_CFLAGS provided by the module:

gcc $GLPK_CFLAGS -c my_prog.c

To link your code, use the variable $GLPK_LIBS:

gcc my_prog.o $GLPK_LIBS -o my_prog

glpsol

Additionally, the GLPK module contains a stand-alone LP/MIP solver, which can be used to process files written in the GNU MathProg modeling language.  The solver can be invoked using the following command syntax:

glpsol [options] [filename]

For a complete list of options, use the following command:

glpsol --help

Further Reading

GLPK Homepage

Supercomputer: 
Service: 

Gnuplot

Introduction

Gnuplot is a portable command-line driven data and function plotting utility.  It was originally intended to allow scientists and students to visualize mathematical functions and data.  

Gnuplot supports many types of plots in two or three dimensions.  It can draw using points, lines, boxes, contours, vector fields surfaces and various associated text.  It also supports various specialized plot types.  

Gnuplot supports many different types of output:  interactive screen display (with mouse and hotkey functionality), pen plotters (like hpgl), printers (including postscript and many color devices), and file formats as vectorial pseudo-devices like LaTeX, metafont, pdf, svg, or bitmap png.  

Version

 

Version Glenn Oakley
4.2.4 X  
4.6.4   X

Usage

To start a Gnuplot session, load the module and launch using the following commands:

module load gnuplot
gnuplot

To access the Gnuplot help menu, type ? into the Gnuplot command line.  

Further Reading

For more information, visit the Gnuplot homepage.  

Supercomputer: 
Service: 

HDF5

HDF5 is a general purpose library and file format for storing scientific data. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic objects, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids.

Availability & Restrictions

HDF5 is available without restriction to all OSC users.

The following versions of HDF5 are available on the OSC Glenn system:

Version Glenn Oakley
1.6.4 X  
1.6.5 X  
1.8.7 X  
1.8.8   X

Usage

Set-up

To use the HDF5 library, first run the following command:

module load hdf5

Building With HDF5

The HDF5 library provides the following variables for use at build time:

Variable Use
$HDF5_C_INCLUDE Use during your compilation step for C programs
$HDF5_F90_INCLUDE Use during your compilation step for FORTRAN programs
$HDF5_C_LIBS Use during your linking step programs
$HDF5_F90_LIBS Use during your linking step for FORTRAN programs

For example, to build the code myprog.c or myprog.f90 with the hdf5 library you would use:

icc -c $HDF5_C_INCLUDE myprog.c
icc -o myprog myprog.o $HDF5_C_LIBS
ifort -c $HDF5_F90_INCLUDE myprog.f90
ifort -o myprog myprog.o $HDF5_F90_LIBS

Batch Usage

You must load the hdf5 module in your batch script before executing a program which is built with the hdf5 library

#PBS -N AppNameJob
#PBS -l nodes=1:ppn=12

module load hdf5
cd $PBS_O_WORKDIR
cp foo.dat $TMPDIR
cd $TMPDIR

appname

cp foo_out.h5 $PBS_O_WORKDIR

Further Reading

See Also

  • netcdf software page
Supercomputer: 
Service: 

HMMER

Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER uses profile HMMs, and can be useful in situations like:

  • if you are working with an evolutionarily diverse protein family, a BLAST search with any individual sequence may not find the rest of the sequences in the family.
  • the top hits in a BLAST search are hypothetical sequences from genome projects.
  • your protein consists of several domains which are of different types.

HMMER (pronounced 'hammer', as in a more precise mining tool than BLAST) was developed by Sean Eddy at Washington University in St. Louis.

HMMER is a very cpu-intensive program and is parallelized using threads, so that each instance of hmmsearch or the other search programs can use all the cpus available on a node. HMMER on OSC clusters are intended for those who need to run HMMER searches on large numbers of query sequences.

Availability & Restrictions

HMMER is available to all OSC users without restriction.

The following versions of HMMER are available on OSC systems:

Version Glenn Oakley
2.3.2 X  
3.0   X

Usage

Set-up

To use HMMER on Glenn, first run the following commands:

module load biosoftw
module load hmmer

To use HMMER on Oakley, first run the following command:

module load hmmer

Using HMMER

Once the hmmer module is loaded, the following commands will be available for your use:

Single sequence queries: new to HMMER3
phmmer Search a sequence against a sequence database. (BLASTP-like)
jackhmmer Iteratively search a sequence against a sequence database. (PSIBLAST-like)

Replacements for HMMER2’s functionality
hmmbuild Build a profile HMM from an input multiple alignment.
hmmsearch Search a profile HMM against a sequence database.
hmmscan Search a sequence against a profile HMM database.
hmmalign Make a multiple alignment of many sequences to a common profile HMM.

Other utilities
hmmconvert Convert profile formats to/from HMMER3 format.
hmmemit Generate (sample) sequences from a profile HMM.
hmmfetch Get a profile HMM by name or accession from an HMM database.
hmmpress Format an HMM database into a binary format for hmmscan.
hmmstat Show summary statistics for each profile in an HMM database.

If you need to know options for a command, type the command name followed by "-h", for example:

hmmalign -h 

Batch Usage

A sample batch job is below:

#PBS -N hmmer
#PBS -j oe
#PBS -l nodes=1:ppn=1
#PBS -S /bin/bash

cd $PBS_O_WORKDIR
hmmalign globins4.align globins45
hmmbuild globins4.hmm globins4.align
hmmsearch globins4.hmm /fdb/fastadb/nr.aa.fas > globins.out

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

Intel Compilers

The Intel compilers for both C/C++ and FORTRAN.

Availability & Restrictions

The Intel Compilers areavailable to all OSC users without restriction.

The following version are available:

Version Glenn Oakley
9.1 X  
10.0 X  
10.0.023 X  
11.1 X  
11.1.056 X  
12.1.0   X
12.1.4.319   X
13.0.1.117  

X

13.1.2.183   X

 

 

Usage

Set-up

To load the intel compilers on the Glenn system, use the following command:

module load intel-compilers-11.1

To load the intel compilers on the Oakley system, use the following command:

module load intel

NOTE: You will need to unload any other compilers before loading the intel compiler packages.

Using the Intel Compilers

Once the intel compiler module has been loaded, the compilers are available for your use. The following table lists common compiler options available in all languages.

Compiler option Purpose
-c Compile only; do not link
-DMACRO[=value] Defines preprocessor macro MACRO with optional value (default value is 1)
-g Enables debugging; disables optimization
-I/directory/name Add /directory/name to the list of directories to be searched for #include files
-L/directory/name Adds /directory/name to the list of directories to be searched for library files
-lname Adds the library libname.a or libname.so to the list of libraries to be linked
-o outfile Names the resulting executable outfile instead of a.out
-UMACRO Removes definition of MACRO from preprocessor
-O0 Disable optimization
-O1 Light optimization
-O2 Heavy optimization (default)
-O3 Aggressive optimization; may change numerical results
-ipo Inline function expansion for calls to procedures defined in separate files
-funroll-loops Loop unrolling
-parallel Automatic parallelization
-openmp Enables translation of OpenMP directives

 

The following table lists some options specific to C/C++

-strict-ansi Enforces strict ANSI C/C++ compliance
-ansi Enforces loose ANSI C/C++ compliance

The following table lists some options specific to Fortran

-convert big_endian Use unformatted I/O compatible with Sun and SGI systems
-convert cray Use unformatted I/O compatible with Cray systems
-i8 Makes 8-byte INTEGERs the default
-module /dir/name Adds /dir/name to the list of directories searched for Fortran 90 modules
-r8 Makes 8-byte REALs the default
-fp-model strict Disables optimizations that can change the results of floating point calculations

Further Reading

See Also

Supercomputer: 
Service: 
Technologies: 
Fields of Science: 

Intel MPI

Intel's implementation of the Message Passing Interface (MPI) library.

Availability & Restrictions

This library may be used as an alternative to - but not in conjunction with - the MVAPICH2 MPI libraries.

Version Glenn Oakley
3.2.2p-006 X  
4.0.0.028 X  
4.0.0pu-027 X  
4.0.3.008   X
4.1.0.024   X

Usage

Set-up

Simply load the module:

module load intelmpi

On Glenn, the modules are named slightly differently:

module load intel-mpi-3.2.2p-006

Since this module conflicts with MVAPICH installations, you should unload those modules first.

Using Intel MPI

Software compiled against this module will use the libraries at runtime.

Building With Intel MPI

On Glenn, we do not recommend building against these libraries. The modules are not configured to set up useful environment variables.

On Oakley, we have defined several environment variables to make it easier to build and link with the Intel MPI libraries.

Variable Use
$MPI_CFLAGS Use during your compilation step for C programs.
$MPI_CXXFLAGS Use during your compilation step for C++ programs.
$MPI_FFLAGS Use during your compilation step for Fortran programs.
$MPI_F90FLAGS Use during your compilation step for Fortran 90 programs.
$MPI_LIBS Use when linking your program to Intel MPI.

In general, for any application already set up to use mpicc, (or similar), compilation should be fairly straightfoward. 

Batch Usage

Running a program compiled against Intel MPI (called my-impi-application) for five-hours on Oakley:

#PBS -N MyIntelMPIJob
#PBS -l nodes=4:ppn=12
#PBS -l walltime=5:00:00

module swap mvapich2 intelmpi
cd $PBS_O_WORKDIR

mpiexec my-impi-application

Further Reading

See Also

Supercomputer: 
Service: 
Technologies: 
Fields of Science: 

JasPer

Introduction

JasPer is an extensible open source utility designed for the manipulation, conversion, compression and decompression of digital images.  Currently supported image formats include bmp, jp2, jpc, jpg, pgx, mif and ras.    

Availability

Version 1.900.1 is available on the Oakley cluster.  

Usage

In order to use the JasPer library, you must load the JasPer module into your environment.  To load the JasPer module, type the following command:

module load jasper

Useful commands

In addition to the JasPer libraries, a few default applications are also provided when the JasPer module is loaded.     

To convert an image from one format to another, use the jasper command.   

jasper [options]

For a comprehensive list of options available for this command, type

jasper --help

To compare particular qualities of two images, use the imgcmp command.

imgcmp -f [file 1] -F [file 2] -m [metric]

For a full list of acceptable metric arguments, type

imgcmp --help

To view a sequence of one or more images, use the jiv command.  

jiv image

Alternatively, if you would like to view multiple images in sequence you can use the --wait option to specify the amount of time between the display of each image, creating a slideshow.  The command below will display each image for 5 seconds before displaying the next image.   

jiv --wait 5 [image1 image2 image3 ...]

For a full list of options, type

jiv --help

Further Reading

Additional information about JasPer can be found online at the JasPer Project Homepage.

Supercomputer: 
Service: 

Jmol

Jmol is a simple molecular visualization program. It can read many file formats and can write various formats. Animations of normal modes and simulations are implemented. Jmol is an OpenScience Java application.

Availability & Restrictions

Jmol is available to all OSC users without restirction.

The following versions of Jmol are available on OSC systems:

Version Glenn Oakley
12.0.8 X  

Jmol is best executed locally. Thus, we recommend downloading the latest version from http://jmol.sourceforge.net/download/. The installation process is trivial.

Usage

Set-up

To setup Glenn for use of Jmol, run the following command:

module load jmol

Using Jmol

To execute Jmol, simply run the jmol binary with an optional input file on the command line:

jmol

Batch Usage

Jmol should be executed interactively.

Further Reading

Supercomputer: 
Service: 

LAMMPS

The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a classical molecular dynamics code designed for high-performance simulation of large molecular systems.  LAMMPS generally scales well on OSC platforms, offers a variety of modelling techniques, and offers GPU enhanced computation.

Availability & Restrictions

LAMMPS is available to all OSC users without restriction.

The following versions of LAMMPS are available on OSC systems:

Version Glenn Oakley
Oct06 X  
Jul07 X  
Jan08 X  
Apr08 X  
Sep09 X  
Mar10 X  
Jun10 X  
Aug10* X  
Oct10 X  
Mar11* X  
Jun11 X  
Jan12 X  
Feb12   X
May12   X

Usage

Set-up

To use LAMMPS on either Glenn or Oakley, run the following command to first set up your environment:

module load lammps

To see other available versions, run the following command:

module avail

Using LAMMPS

Once a module is loaded, LAMMPS can be run with the following command:

lammps < input.file

Batch Usage

Sample batch scripts and LAMMPS input files are available here:

/nfs/10/srb/workshops/compchem/lammps/

Below is a sample batch script for the Glenn Cluster. It asks for 16 processors and 10 hours of walltime. If the job goes beyond 10 hours, the job would be terminated.

#PBS -N chain  
#PBS -l nodes=2:ppn=8  
#PBS -l walltime=10:00:00  
#PBS -S /bin/bash  
#PBS -j oe  
module load lammps  
cd $PBS_O_WORKDIR  
pbsdcp chain.in $TMPDIR  
cd $TMPDIR  
lammps < chain.in  
pbsdcp -g '*' $PBS_O_WORKDIR

GPU Usage

On both OSC clusters, LAMMPS can run on GPU's.  See the sample scripts for details.  The following text is specific to the Glenn cluster and demonstrates using GPU's to speed up certain pair_style's. This example shows how to load and run such a GPU-enabled version.  Here is a sample PBS script to run. It uses one node and two GPU's for the computation.

#PBS -N lammpsTest
#PBS -l nodes=1:ppn=8,feature=gpu
#PBS -l walltime=00:10:00
#PBS -S /bin/bash
#PBS -j oe

module switch mvapich2-1.4-gnu
module load cuda-3.0
module load fftw2-2.1.5-double-mvapich2-1.4-gnu
module load lammps-25Mar11cuda

cd $PBS_O_WORKDIR

cp lj-gpu.in $TMPDIR
cd $TMPDIR

mpiexec -np 2 lmp_osc < lj-gpu.in > out

cp $TMPDIR/* $PBS_O_WORKDIR

Here is a sample input with the necessary modifications to use a GPU pair_style, and it uses both GPU's.

newton off

units lj
atom_style atomic
lattice fcc 0.8442
region box block 0 20 0 20 0 20
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 1.44 87287 loop geom

pair_style lj/cut/gpu 2.5
pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify delay 0 every 20 check no

fix 0 all gpu force/neigh 0 1 1.0
fix 1 all nve

timestep 0.003
thermo 100
run 100

Please note that the you cannot run more than two threads per node. More than two threads will cause the application to hang, since there are only two GPU's per node. The LAMMPS input and the pbs script must match up with the number of GPU's being used.

Please refer to the documentation for more pair_style's that can be used in such simulations.

Further Reading

 

Supercomputer: 
Service: 

LS-DYNA

LS-DYNA is a general purpose finite element code for simulating complex structural problems, specializing in nonlinear, transient dynamic problems using explicit integration. LS-DYNA is one of the codes developed at Livermore Software Technology Corporation (LSTC).

Availability & Restrictions

LS-DYNA is available to OSC users that have filled out the appropriate academic license agreement form found here.

Serial (smp -for single node jobs) and parallel (mpp -for multipe node jobs) versions of LS-DYNA solvers are installed at OSC. You may check the available versions on Oakley using module avail dyna. (On Glenn, use module avail lsdyna for smp solvers, and use module avail mpp for mpp solvers.) In the module name, '_s' indicates single precision and '_d' indicates double precision. For example, mpp971_d_R5 is the mpp solver with double precision on Glenn. The following versions are currently available on OSC cluster systems (including smp/mpp, single/double precision):

Version Glenn Oakley
971-R4.2.1 X  
971-R5 X X
971-R5.1.1   X
971-R7.0.0 X X

Using LS-DYNA

LS-DYNA can be used via the batch system, by requesting the computing resources and launching solvers (smp/mpp) in a script file. See the following example.

Batch Usage

To use LS-DYNA via the batch system, here's what you need to do:

1) copy your input files (explorer.k in the example below) to your work directory at OSC
2) create a batch script, similar to the following file, saved as 'test.job':

#PBS -N plate_test
#PBS -l walltime=5:00:00
#PBS -l nodes=2:ppn=8
#PBS -j oe
#PBS -S /bin/csh
# The following lines set up the LSDYNA environment
module unload mpi
module load mpp971_d_R5
#
# Move to the directory where the input files are located
#
cd $PBS_O_WORKDIR
#
# Run LSDYNA (number of cpus > 1)
#
mpiexec mpp971 I=explorer.k NCPU=16

     This example script uses the mpp solver for a parallel job (nodes>1) on Glenn. 

 Notes:

  • If you're running mpp on Oakley, you'll need to change the mpi loading due to the different system configuration. Essentially, you need the following command in your script:
    module swap mvapich2/1.7 intelmpi

     in place of

    module unload mpi

      in the above.

  • If a smp solver is launched, a different executable should be used:
#PBS -l nodes=1:ppn=8
...
module load lsdyna971_d_R5
...
lsdyna I=explorer.k NCPU=8

 3) submit the script to the batch queue with the command:

qsub test.job

    when the job is finished, all the result files will be found in the directory where you submitted your job ($PBS_O_WORKDIR).

  Alternatively, you can submit your job from the temporary directory ($TMPDIR), which is faster to access for the system and might be beneficial for bigger jobs. Note that $TMPDIR is uniquely associated with the job submitted and will be cleared when the job ends. So you need to copy your results back to your work directory at the end of your script. An example scrip should include the following lines:

...
cd $TMPDIR
cp $PBS_O_WORKDIR/explorer.k .
... #launch the solver and execute
pbsdcp -g '*' $PBS_O_WORKDIR

 

Further Reading

See Also

 

Supercomputer: 
Service: 

LS-PrePost

Introduction

LS-PrePost is an advanced pre and post-processor that is delivered free with LS-DYNA. The user interface is designed to be both efficient and intuitive. LS-PrePost runs on Windows, Linux, and Unix utilizing OpenGL graphics to achieve fast rendering and XY plotting. The latest builds can be downloaded from LSTC's FTP Site.

 

The prefered way of accessing LS-Prepost is through OnDemand's Glenn desktop application.  This gives you a preconfigured enviornment with GPU acceleration enabled.

Availability

LS-PrePost is available on both the Glenn and Oakley clusters. Currently there are no shared modules created for LS-Prepost.

Version Glenn Oakley
v2.3 X  
v3.0   X
v3.2   X
v4.0   X

 

 

 

Usage

Running LS-PrePost on Oakley through OnDemand's Glenn Destkop

Below are instructions on running LS-PrePost on Oakley through Glenn's OnDemand desktop interface with GPU acceleration enabled.  To run LS-PrePost with a slower X-tunneling procedure, see the specified instructions below.

 

1) Log in to OnDemand with your HPC username and password.

2) Launch the Glenn Desktop from "Apps" menu. 

3) Open a Terminal window (Applications > Accessories > Terminal)

4) Type the following command to connect to oakley:

    ssh -X username@oakley.osc.edu       

          * Where "username" is your username.

5)  Once logged in to Oakley, submit an interactive job with the following command:

    qsub -X -I -l nodes=1:ppn=12:gpus=2:vis -l walltime=hh:mm:ss

          * pick a walltime that is close to the amount of time you will need to for working in the GUI application.

6) Once your job starts, make a note of the hostname for the compute node your job is running on.  You can find this information by typing the following command:

    hostname

7) Open another Terminal window, and type the following commands:

     module load VirtualGL
     vglconnect username@job-hostname

           * job-hostname is the information you found in step 6; your command might look something like this, for example:  

     vglconnect ahahn@n0656.ten.osc.edu

           You'll be asked a password to connect, which should be your HPC password.

8) Now, you should be connected to your running job's GPU node.  Run the following commands to launch LS-PrePost version 4.0:

     export LD_LIBRARY_PATH=/usr/local/MATLAB/R2013a/sys/opengl/lib/glnxa64
     /usr/local/lstc/ls-prepost/lsprepost4.0_centos6/lspp4

At startup LS-PrePost displays a graphical interface for model generation and results post-processing.

 

Running LS-PrePost on Oakley or Glenn with X-11 forwarding

The following procedure will result in a much slower and GUI interface, but may be useful if the above instructions are not working.  It can be done completely from the command line, with no need for logging into OnDemand.  You may need to edit your terminal settings to enabled x11 forwarding.

1) Login to Oakley or Glenn with X11 forwarding enabled

ssh -X username@oakley.osc.edu

or

ssh -X username@glenn.osc.edu

 

2) Submit a Interactive job

qsub -I -X -l nodes=1:ppn=12 -l walltime=hh:mm:ss

and wait for it to start.

 

2) Load the LS-Dyna module

module load ls-dyna

 

3) Start LS-PrePost application

lspp3

 

A x11 window should pop up.  If you get a error along the lines of:

Error: Unable to initialize gtk, is DISPLAY set properly?

 

Double check that you:

1) logged in with x11 forwarding enabled

2) have configured your x11 settings for your terminal

3) included the -X with the qsub command

4) have a x11 client running on your computer (Xming, Xorg, XQuarts, etc.).

Documentation

Documentation for LS-PrePost may be obtained at: http://www.lstc.com/lspp/

Supercomputer: 

User-Defined Material for LS-DYNA

This page describes how to specify user defined material to use within LS-DYNA.  The user-defined subroutines in LS-DYNA allow the program to be customized for particular applications.  In order to define user material, LS-DYNA must be recompiled.

Availability

LS-DYNA with user defined material models are only available on the Glenn Cluster.

Version

The following versions on Glenn are available for use with user defined material models:

smp version:
lsdyna971_d_R4.2.1_umat

 

 

 

mpp versions:
mpp971_s_7600.2.398
mpp971_d_7600.2.398
mpp971_s_R3.1
mpp971_d_R3.1
mpp971_d_R4.2

Usage

The first step to running a simulation with user defined material is to build a new executable. The following is an example done with solver version mpp971_s_7600.2.398.

When you log into the Glenn system, load mpp971_s_7600.2.398 with the command:

module load mpp971_s_7600

Next, copy the mpp971_s_7600 object files and Makefile to your current directory:

cp /usr/local/lstc/mpp971_s_7600.2.398/usermat/* $PWD

Next, update the dyn21.f file with your user defined material model subroutine. Please see the LS-DYNA User's Manual (Keyword version) for details regarding the format and structure of this file.

Once your user defined model is setup correctly in dyn21.f, build the new mpp971 executable with the command:

make

To execute a multi processor (ppn > 1) run with your new executable, execute the following steps:

1) move your input file to a directory on Glenn (pipe.k in the example below)

2) copy your newly created mpp971 executable to this directory as well

3) create a batch script (lstc_umat.job) like the following:

#PBS -N LSDYNA_umat
#PBS -l walltime=1:00:00
#PBS -l nodes=2:ppn=8
#PBS -j oe
#PBS -S /bin/csh

# This is the template batch script for running a pre-compiled
# MPP 971 v7600 LS-DYNA.  
# Total number of processors is ( nodes x ppn )
#
# The following lines set up the LSDYNA environment
# on the P4 cluster (glenn.osc.edu)
module load mpp971_s_7600
#
# Move to the directory where the job was submitted from
# (i.e. PBS_O_WORKDIR = directory where you typed qsub)
#
cd $PBS_O_WORKDIR
#
# Run LSDYNA 
# NOTE: you have to put in your input file name
#
mpiexec mpp971 I=pipe.k NCPU=16

          4) Next, submit this job to the batch queue with the command:

       qsub lstc_umat.job

The output result files will be saved to the directory you ran the qsub command from (known as the $PBS_O_WORKDIR_

Documentation

On-line documentation is available on LSTC website.

See Also

 

 

Supercomputer: 
Service: 

MATLAB

Introduction

MATLAB is a technical computing environment for high-performance numeric computation and visualization. MATLAB integrates numerical analysis, matrix computation, signal processing, and graphics in an easy-to-use envionment where problems and solutions are expressed just as they are written mathematically--without traditional programming.

Versions

Different versions of MATLAB are available on the Glenn and Oakley clusters. 

Version Glenn Oakley
2011a X  
2011b X X
2012a X* X
2012b   X*

* - Default version

Availability

Ohio State University students/faculty

MATLAB is available on the Glenn Cluster and the Oakley Cluster.
  All Ohio State users must be added to the license server before using MATLAB.  Please contact OSC Help for license server requests.    

Non Ohio State University students/faculty

OSC cannot provide MATLAB licenses for academic use to students and faculty outside of Ohio State at this time.  However, if you have your own MATLAB license, including the Parallel Computing Toolbox, you will be able to connect to the MATLAB Distributed Computing Server on Oakley via your MATLAB interface. See HOW-TO: Configure the MATLAB Parallel Computing Toolbox.

Use of MATLAB requires validation. To obtain validation please complete and return the "Agreement for non-OSU/OSC use of Matlab DCS." This can be obtained from your site consultant or from the file matlab_dcs.txt located in the Academic Agreement Forms. 

Usage

Initialization

To load the default version of MATLAB into your environment, you can use the following command.

module load matlab

To load a specific version of MATLAB on your current system, you can use variations of the previous command.  The command format expected by each system differs slightly, thus the invocation of a particular version of MATLAB depends on the system you wish to run on.    

For example, if you’d like to load MATLAB 2012a on Glenn, type:

module load matlabR2012a

If you’d like to load MATLAB 2012a on Oakley, type:

module load matlab/R2012a

For a list of available MATLAB versions and the format expected on Glenn, type:

module avail matlab

For a list of all available MATLAB versions and the format expected on Oakley, type:

module spider matlab

Running MATLAB

The following command will start an interactive, command line version of MATLAB on either cluster system:

matlab -nodisplay

If you are able to use X-11 forwarding and have enabled it in your SSH client software preferences, you can run MATLAB using the GUI by typing the command “matlab”.

For more information about the matlab command usage, type “matlab –h” for a complete list of command line options.

The commands listed above will run MATLAB on the login node you are connected to. As the login node is a shared resource, running scripts that require significant computational resources will impact the usability of the cluster for others. As such, you should not use interactive MATLAB sessions on the login node for any significant computation. If your MATLAB script requires significant time, CPU power, or memory, you should run your code via the batch system.

Running in Batch

MATLAB should be run via the batch system for any scripts that are computationally intensive, long running, or memory intensive.

Interactive Batch Jobs

To run an interactive batch job using the command line version of MATLAB, you can use the following command:

qsub –I –X –l nodes=n:ppn=p –l walltime=hh:mm:ss

After submitting your job, you will be logged on to one of the compute nodes.  Here you can run MATLAB interactively by loading the MATLAB module and running MATLAB with the options of your choice as described above.  The –X flag enables X-11 forwarding on the compute node, so you can use the MATLAB GUI if you choose.

Non-interactive Batch Jobs

In order to run MATLAB non-interactively via the batch system, you will require a batch submission script and a MATLAB M-file containing the script that will be run via the batch system. You can create both of these files using any text editor you like in a working directory on the system of your choice.

Below is an example batch submission script and a simple M-file. The batch script runs the M-file via the batch system.

Example batch submission script, script.job:

#PBS -N matlab_example
#PBS -l walltime=00:10:00
#PBS -l nodes=1:ppn=8
#PBS -j oe


module load matlab

matlab -nodisplay -r hello
# end of example file

Example M-file, hello.m:

%Example M-file for Hello World

disp 'Hello World' 

exit 
  
% end of example file

In order to run hello.m via the batch system, submit the script.job file with the following command:

qsub script.job

This will run hello.m via the batch system, and all output from the running of the script will be saved in the output.txt file.

For more information about using the batch system, see Batch Processing at OSC. 

Toolboxes

To view a complete list of the currently available toolboxes, in the MATLAB command line type the command "ver".

Parallel Processing in MATLAB

MATLAB supports both implicit multiprocessing (multithreading) and explicit multiprocessing across multiple nodes.  

Multithreading

Multithreading allows some functions in MATLAB to distribute the work load between cores of the node that your job is running on.  By default, all of the current versions of MATLAB available on the OSC clusters have multithreading enabled. 

The use of multithreading differs slightly depending on which cluster you'd like to run your jobs on.  On Oakley, the system will use a number of threads equal to the number of cores you request.  Therefore, if you request nodes=1:ppn=4, your job will only spawn four threads.  However, on the Glenn cluster the system will use a number of threads equal to the number of cores on one node.  You must, therefore, request nodes=1:ppn=8 otherwise your jobs may be killed.  

Multithreading increases the speed of some linear algebra routines, but if you would like to disable multithreading you may request nodes=1:ppn=1 and include the option "-singleCompThread" when running MATLAB.  

An example:

#PBS -N disable_multithreading
#PBS -l walltime=00:10:00
#PBS -l nodes=1:ppn=1
#PBS -j oe


module load matlab

matlab -singleCompThread -nodisplay -r hello

# end of example file

Parallel computing across multiple nodes

You can accomplish parallel processing using multiple nodes by using the Parallel Computing Toolbox in conjunction with the MATLAB Distributed Computing Server.  For more information about configuration and usage, see HOW-TO: Configure the MATLAB Parallel Computing Toolbox.  

Further Reading

Official PDF documentation can be obtained from the MathWorks website

See Also

Supercomputer: 
Service: 
Fields of Science: 

Maven (Apache)

Apache Maven is a plugin-based build automation tool, similar in purpose to GNU Make or Apache Ant. It is most commonly used with Java projects, but also supports other languages such as .NET and C/C++.

Availability & Compatibility

Version Glenn Oakley
3.0.4   X

Restrictions

There are no restrictions for this software; any OSC user may make use of the Apache Maven.

Usage

Set-up

To use Maven, load the the "maven" module with the following command.

module load maven

To test that the install worked correctly, run "mvn --version". You should see output similar to that shown below:

 

$ mvn --version
Apache Maven 3.0.4 (r1232337; 2012-01-17 03:44:56-0500)
Maven home: /usr/local/maven/3.0.4
Java version: 1.6.0_29, vendor: Sun Microsystems Inc.
Java home: /usr/lib/jvm/java-1.6.0-sun-1.6.0.29.x86_64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-131.17.1.el6.x86_64", arch: "amd64", family: "unix"

 

Basic Usage

Once the module is loaded, you can use Maven just as you would on your local machine. For example, the session below illustrates initializing a new project call "my-app":

 

$ mvn archetype:generate -DgroupId=com.mycompany.app -DartifactId=my-app -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] >>> maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom >>>
[INFO] 
[INFO] <<< maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom <<<
[INFO] 
[INFO] --- maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom ---
[INFO] Generating project in Batch mode
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Old (1.x) Archetype: maven-archetype-quickstart:1.0
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.mycompany.app
[INFO] Parameter: packageName, Value: com.mycompany.app
[INFO] Parameter: package, Value: com.mycompany.app
[INFO] Parameter: artifactId, Value: my-app
[INFO] Parameter: basedir, Value: /nfs/12/jmccance
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] project created from Old (1.x) Archetype in dir: /nfs/12/jmccance/my-app
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.429s
[INFO] Finished at: Fri Jun 08 14:30:57 EDT 2012
[INFO] Final Memory: 9M/361M
[INFO] ------------------------------------------------------------------------

Further Reading

See Also

Supercomputer: 
Service: 

MINPACK

MINPACK is a library of Fortran routines for the solution of non-linear, multivariate minimization problems.

Availability & Restrictions

MINPACK is available to all OSC users without restrictions.

The following versions of MINPACK are available on OSC systems:

Version Glenn Oakley
n/a X  

Usage

Set-up

To use MINPACK on Glenn, first run the following command:

module load minpack

Building With MINPACK

The following environment variables are set when the minpack module is loaded:

Variable Use
$MINPACK Compiler and linker flags for building with MINPACK

To build a Fortran program named my_prog:

module load minpack
ifort -o my_prog my_prog.f $MINPACK

Further Reading

Supercomputer: 
Service: 

MKL - Intel Math Kernel Library

High-performance, multithreaded mathematics libraries for linear algebra, fast Fourier transforms, vector math, and more.

Availability & Compatibility

OSC supports single-process use of MKL for LAPACK and BLAS levels one through three. For multi-process applications, we also support the ScaLAPACK, FFTW2, and FFTW3 MKL wrappers. MKL modules are available for the Intel, gnu, and PGI compilers.

Version Glenn Oakley Statewide Software
10.0.3 X    
10.3.0 X X X

Restrictions

All OSC users may use the Intel MKL for academic purposes.

Usage

Set-up

To configure your environment for the Intel MKL, load the “mkl” module:

module load mkl

This step is required for both building and running MKL applications. Exception: The "mkl" module is usually not needed when using the Intel compilers; just use the "-mkl" flag on the compile and link steps.

Building With MKL (Oakley)

The Oakley MKL module will automatically configure your environment to locate the appropriate include files. When linking, however, you have a variety of options to choose from depending on your needs. These options come in the form of environment variables defined by the module. Which variable you include when linking will determine what particular configuration of MKL you get.

Environment Variable Function
$MKL_CFLAGS Include flags for C
$MKL_FFLAGS Include flags for Fortran
$MKL_LIBS Use multithreaded MKL with 32-bit integers. This is the standard configuration.
$MKL_LIBS_INT64 Use multithreaded MKL with 64-bit (“ILP64”) integers.
$MKL_LIBS_SEQ Use single-threaded (“SEQuential”) MKL with with 32-bit integers.
$MKL_LIBS_SEQ_INT64 Use single-threaded MKL with 64-bit integers.

Notes:

  • 64-bit integers are not supported by the FFTW2 wrappers. (See below.)
  • The Intel compilers are specially configured for Intel MKL. If you are using this toolchain, you can use the “-mkl” option when linking in place of $MKL_LIBS.
  • The default, multithreaded MKL libraries will automatically run in a single thread if they are called from an OpenMP parallel region. If you want to force single-threaded behavior throughout your program, choose one of the “_SEQ” variables from the list above.

Fortran 95 BLAS/LAPACK Wrappers

To compile Fortran 95 programs with modules, add the $MKL_F95FLAGS variable to your compilation step. If you need 64-bit integers, use the $MKL_F95_FLAGS_INT64 instead. When linking, you will also need to use $MKL_F95LIBS (or $MKL_F95LIBS_INT64 if using 64-bit integers). These variables will allow your programs to use the BLAS and LAPACK wrappers for MKL.

FFTW Wrappers

A number of “wrappers” are provided in the form of environment variables that allow you to use FFTW APIs with MKL. Variables ending in FLAGS should be included with your compilation step, while variables ending in LIBS should be included in your linking step.

Environment Variable Function
$MKL_FFTW_CFLAGS Compile variable for C programs.
$MKL_FFTW_FFLAGS Compile variable for Fortran programs.
$MKL_FFTW2_D_CLIBS Linking variable for double-precision FFTW2 C programs.
$MKL_FFTW2_D_FLIBS Linking variable for double-precision FFTW2 Fortran programs.
$MKL_FFTW2_S_CLIBS Linking variable for single-precision FFTW2 C programs.
$MKL_FFTW2_S_FLIBS Linking variable for single-precision FFTW2 Fortran programs.
$MKL_FFTW3_CLIBS Linking variable for FFTW3 C programs.
$MKL_FFTW3_FLIBS Linking variable for FFTW3 Fortran programs.

Notes:

  • FFTW3 is the default wrapper and should be picked up by your linker automatically. If you would prefer to use FFTW2, use the appropriate linking variable.
  • The FFTW2 wrappers do not support 64-bit (“ILP64”) integers.

Examples

The following examples use the Intel compilers, though they should work using the GNU and Portland Group compilers as well.

Compiling a Fortran77 program with MKL:

ifort -c my_prog.f
ifort -o my_prog my_prog.o $MKL_LIBS

Compiling a C program with MKL:

icc -c my_prog.c
icc -o my_prog my_prog.o $MKL_LIBS

Compiling a module-based Fortran90/95 program with MKL:

ifort -c my_prog.f95 $MKL_F95FLAGS
ifort -o my_prog my_prog.o $MKL_F95LIBS $MKL_LIBS

Compiling a C program with the double-precision FFTW2 wrapper:

icc -c $MKL_FFTW_CFLAGS -DMKL_DOUBLE my_prog.c
icc -o my_prog my_prog.o $MKL_FFTW2_D_CLIBS $MKL_LIBS

Compiling a Fortran77 program with the FFTW2 wrapper:

ifort -c $MKL_FFTW_FFLAGS my_prog.f
ifort -o my_prog my_prog.o $MKL_FFTW2_S_FLIBS $MKL_LIBS

Compiling a C program with the FFTW3 wrapper:

icc -c $MKL_FFTW_CFLAGS my_prog.c
icc -o my_prog my_prog.o $MKL_FFTW3_CLIBS $MKL_LIBS

Compiling a Fortran program with the FFTW3 wrapper:

ifort -c $MKL_FFTW_FFLAGS my_prog.f 
ifort -o my_prog my_prog.o $MKL_FFTW3_CLIBS $MKL_LIBS

Advanced Usage

If you are already familiar with building MKL applications with your chosen build tool (GCC, Intel, or PGI) and you do not wish to use the convenience variables discussed above, you may wish to use the $MKLROOT variable instead.

This variable points to the installation directory for Intel MKL. All include files can be found in $MKLROOT/include, for example, and the libraries are in $MKLROOT/lib/intel64.

Running MKL Programs

When running an MKL program, you need to be sure to take the following steps.

  1. Load the mkl module:

    module load mkl
    
  2. If running with parallel MKL, set OMP_NUM_THREADS to match the number of cores per node in your process. In the bash shell, you can accomplish this with:

    PPNPERNODE=$(expr $(cat $PBS_NODEFILE | wc -l) / $(uniq $PBS_NODEFILE | wc -l))
    export OMP_NUM_THREADS=$PPNPERNODE
    

Further Reading

See Also

Supercomputer: 
Service: 

mpiBLAST

mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. mpiBLAST takes advantage of distributed computational resources, i.e., a cluster, through explicit MPI communication and thereby utilizes all available resources unlike standard NCBI BLAST which can only take advantage of shared-memory multi-processors (SMPs).

Availability & Restrictions

mpiBLAST is available without restriction to all OSC users.

The following version of mpiBLAST are available on OSC systems:

Version Glenn Oakley
1.6.0 X X

Usage

Set-up

To load the mpiBLAST software on the Glenn system, use the following commands:

module load biosoftw
module load mpiblast

On the Oakley system, use the following command:

module load mpiblast

Using mpiBLAST

Once mpiblast module is loaded, the commands are available for your use.

mpiblast 
mpiblast_cleanup 
mpiformatdb

Formatting a database

Before processing blast queries the sequence database must be formatted with mpiformatdb. The command line syntax looks like this:
mpiformatdb -N 16 -i nt -o T

The above command would format the nt database into 16 fragments. Note that currently mpiformatdb does not support multiple input files.

mpiformatdb places the formatted database fragments in the same directory as the FASTA database. To specify a different target location, use the "-n" option as what is available in the NCBI formatdb.

Querying the database

mpiblast command line syntax is nearly identical to NCBI's blastall program. Running a query on 18 nodes would look like:
mpiexec -n 18 mpiblast -p blastn -d nt -i blast_query.fas -o blast_results.txt

The above command would query the sequences in blast_query.fas against the nt database and write out results to the blast_results.txt file in the current working directory. By default, mpiBLAST reads configuration information from ~/.ncbirc. Furthermore, mpiBLAST needs at least 3 processes to perform a search: two processes dedicated for scheduling tasks and coordinating file output, while any additional processes actually perform search tasks.

Extra options to mpiblast

  • --partition-size=[integer]
    Enable hierarchical scheduling with multiple masters. The partition size equals the number of workers in a partition plus 1 (the master process). For example, a partition size of 17 creates partitions consisting of 16 workers and 1 master. An individual output file will be generated for each partition. By default, mpiBLAST uses one partition. This option is only available for version 1.6 or above.
  • --replica-group-size=[integer]
    Specify how database fragments are replicated within a partition. Suppose the total number of database fragments is F, the number of MPI processes in a partition is N, and the replica-group-size is G, then in total (N-1)/G database replicas will be distributed in the partition (the master process does not host any database fragments), and each worker process will host F/G fragments. In other words, a database replica will be distributed to every G MPI processes.
  • --query-segment-size=[integer]
    The default value is 5. Specify the number of query sequences that will be fetched from the supermaster to the master at a time. This parameter controls the granularity of load balancing between different partitions. This option is only available for version 1.6 or above.
  • --use-parallel-write
    Enable the high-performance parallel output solution. Note the current implementation of parallel-write does not require a parallel file system.
  • --use-virtual-frags
    Enable workers to cache database fragments in memory instead of local storage. This is recommended on diskless platforms where there is no local storage attaching to each processor. Default to be enabled on Blue Gene systems.
  • --predistribute-db
    Distribute database fragments to workers before the search begins. Especially useful in reducing data input time when multiple database replicas need to be distributed to workers.
  • --output-search-stats
    Enable output of the search statistics in the pairwise and XML output format. This could cause performance degradation on some diskless systems such as Blue Gene.
  • --removedb
    Removes the local copy of the database from each node before terminating execution.
  • --copy-via=[cp|rcp|scp|mpi|none]
    Sets the method of copying files that each worker will use. Default = "cp"

    • cp : use standard file system "cp" command. Additional option is --concurrent.
    • rcp : use rsh "rcp" command. Additonal option is --concurrent.
    • scp : use ssh "scp" command. Additional option is --concurrent.
    • mpi : use MPI_Send/MPI_Recv to copy files. Additional option is --mpi-size.
    • none : do not copy files, instead use shared storage as local storage.
  • --debug[=filename]
    Produces verbose debugging output for each node, optionally logs the output to a file.
  • --time-profile=[filename]
    Reports execution time profile.
  • --version
    Print the mpiBLAST version.

Please refer to the README file in the mpiBLAST package for performance tuning guide.

Removing a database

The --removedb command line option will cause mpiBLAST to do all work in a temporary directory that will get removed from each node's local storage directory upon successful termination. For example:
mpiexec -n 18 mpiblast -p blastx -d yeast.aa -i ech_10k.fas -o results.txt --removedb

The above command would perform a 18 node (16 worker) search of the yeast.aa database, writing the output to results.txt. Upon completion, worker nodes would delete the yeast.aa database fragments from their local storage.

Databases can also be removed without performing a search in the following manner:
mpiexec -n 18 mpiblast_cleanup

Batch Usage

Below is a sample batch script for running mpiBLAST job. It asks for 24 processors and 30 minutes of walltime.

#PBS -l walltime=30:00
#PBS -l nodes=1:ppn=12
#PBS -N mpiBLAST
#PBS -j oe

cp /usr/local/mpiblast/1.6.0/.ncbirc ./
module load mpiblast

# copy data over to $TMPDIR on compute node
cd $PBS_O_WORKDIR
cp query.fasta $TMPDIR
cp db/benchmark.fasta* $TMPDIR

# Break the database into 10 pieces
cd $TMPDIR
/usr/bin/time mpiformatdb -N 10 -i benchmark.fasta -o T -p T
cp benchmark.fasta* /nfs/proj01/PZS0002/biosoftw/db/

# run mpiblast
/usr/bin/time mpiexec -n 12 mpiblast -p blastp -d benchmark.fasta -i query.fasta -o blast_results.txt


# Copy output back to working directory
mkdir $PBS_O_WORKDIR/$PBS_JOBID
cp blast_results.txt $PBS_O_WORKDIR/$PBS_JOBID
cd $PBS_O_WORKDIR

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

MPI Library

MPI is a standard library for performing parallel processing using a distributed-memory model. The Glenn and Oakley clusters at OSC use the MVAPICH implementation of the Message Passing Interface (MPI), which is based on MPICH and optimized for the high-speed Infiniband interconnects.

MVAPICH2, based on the MPI-2 standard, is installed on both Oakley and Glenn. The older MVAPICH, based on the MPI-1 standard, is installed on Glenn only.

Availability & Restrictions

MPI is available without restriction to all OSC users.

Installations are available for the Intel, PGI, and gnu compilers.

The following versions of MVAPICH2 are available on OSC systems:

Version Glenn Oakley
1.5 X*  
1.6 X  
1.7   X*
1.8   X
1.9   X

The following versions of MVAPICH (MPI-1) are available on OSC systems:

Version Glenn Oakley
1.1 X*  

Some older versions are also available.

*Default version. The default on Oakley is the build corresponding to the currently loaded compiler. The default on Glenn is the MPI-1 build for the PGI compiler.

Usage

Set-up

To set up your environment for using the MPI libraries, you must load the appropriate module. On Oakley this is pretty straightforward:

module load mvapich2

You will get the default version for the compiler you have loaded. (Be sure to swap the intel compiler module for the gnu module if you're using gnu.)

On Glenn you should load the module shown in the table below for the compiler you're using. Be sure to unload the default module before loading a new one.

  PGI Intel GNU
MPI-1 mpi mvapich-1.1-fixes-intel mvapich-1.1-fixes-gnu
MPI-2 mpi2 mvapich2-1.5-intel mvapich2-1.5-gnu

To see all available mvapich modules on Glenn, use this command: module avail mvapich

For example, to use the MPI-2 library with the gnu compilers:

module unload mpi
module load mvapich2-1.5-gnu

Building With MPI

To build a program that uses MPI, you should use the compiler wrappers provided on the system. They accept the same options as the underlying compiler. The commands are shown in the following table.

C mpicc
C++ mpicxx
FORTRAN 77 mpif77
Fortran 90 mpif90

For example, to build the code my_prog.c using the -O2 option, you would use:

mpicc -o my_prog -O2 my_prog.c

In rare cases you may be unable to use the wrappers. In that case you should use the environment variables set by the module.

Variable Use
$MPI_CFLAGS Use during your compilation step for C programs.
$MPI_CXXFLAGS Use during your compilation step for C++ programs.
$MPI_FFLAGS Use during your compilation step for Fortran 77 programs.
$MPI_F90FLAGS Use during your compilation step for Fortran 90 programs.
$MPI_LIBS Use when linking your program to the MPI libraries.

For example, to build the code my_prog.c without using the wrappers you would use:

mpicc -c $MPI_CFLAGS my_prog.c
mpicc -o my_prog my_prog.o $MPI_LIBS

Batch Usage

Programs built with MPI can only be run in the batch environment at OSC. For information on starting MPI programs using the mpiexec command, see Batch Processing at OSC.

Be sure to load the same compiler and mvapich modules at execution time as at build time.

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

MPJ-Express

MPJ-Express is a Java library that provides message passing capabilities for parallel computing in Java applications.

MPJ-Express is available on Oakley.

Availability & Restrictions

MPJ-Express is available without restriction to all OSC users.

The following versions of MPJ-Express are available on OSC systems:

Version Glenn Oakley
0.38   X


Note: We have modified MPJ-Express slightly from the default distribution to work nicely with our infrastructure. Installing your own copy of a newer version will not work. Please contact OSC Help if you need a newer version of this library.

Usage

Set-up

To set up your environment for using the MPJ-Express libraries, you must load the appropriate module. On Oakley this is pretty straightforward:

module load mpj-express

Building With MPJ-Express

When you load the MPJ-Express module, we will modify your CLASSPATH to ensure the JAR files are located.

Batch Usage

Programs built with MPJ-Express can only be run in the batch environment at OSC. For information on writing batch scripts, see Batch Processing at OSC.

If you have loaded a Java module when compiling, be sure to load that same module in your batch script to avoid version mismatches.

Here is a basic "Hello World" program, which needs to be in a file called "HelloWorld.java", which we use in our example batch script:

 

public class HelloWorld
{
     public static void main(String[] args) 
                     throws Exception
     {
          MPI.Init(args);
          int me = MPI.COMM_WORLD.Rank();
          int size = MPI.COMM_WORLD.Size();
          System.out.println
            ("Hi from " + me + " of " + size + "\n");
          MPI.Finalize();
     }
}

 

Below we have included a sample batch script:

 

#PBS -N mpj-world
#PBS -l walltime=0:05:00
#PBS -l nodes=2:ppn=12
#PBS -S /bin/bash

# Set environment
module load mpj-express/0.38

# Change to the directory you submitted the job from
cd $PBS_O_WORKDIR

# Set up the directory for the daemon logs to go in
export MPJ_LOG="$PBS_O_WORKDIR/log"
mkdir $MPJ_LOG

# Re-compile our application - not strictly necessary,
# if the source is unchanged.
javac HelloWorld.java

# Start MPJ
mpjboot

# Sleep a bit to the the daemons start
sleep 10

# Launch your parallel program
cat $PBS_NODEFILE | uniq > machines
NPROCS=`cat $PBS_NODEFILE | wc -l`
mpjrun.sh -np $NPROCS -dev niodev HelloWorld

# Shutdown MPJ
rm -f machines
mpjhalt

 

Further Reading

See Also

Supercomputer: 
Service: 
Technologies: 

MrBayes

MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees.

Availability & Restrictions

MrBayes is available without restriction to all OSC users.

The following versions of MrBayes are available:

Version Glenn Oakley
3.1.2 X  
3.2.1   X

Usage

Set-up

To load the MrBayes software on the Glenn system, use the following commands:

module load biosoftw
module load mrbayes

On the Oakley system, use the following command:

module load mrbayes

Using MrBayes

Once mrbayes module is loaded, the commands are available for your use. The command for serial version is

mb

Parallel Mrbayes can only be used via batch system. The command is

mb-parallel

Batch Usage

Below is a sample batch script for running serial mrbayes job. It asks for 1 processor and 30 minutes of walltime.

#PBS -l walltime=30:00
#PBS -l nodes=1:ppn=1
#PBS -N mb
#PBS -j oe
set -x
date
cd $PBS_O_WORKDIR
cp ./primates.nex $TMPDIR
cp ./primates.nxs $TMPDIR
cd $TMPDIR
module load mrbayes-3.1.2
/usr/bin/time mb ./primates.nxs > mb.log
cp * $PBS_O_WORKDIR
date

Below is a sample batch script for running a parallel job on Oakley. It asks for 2 nodes, 24 processors and 30 minutes of walltime.

#PBS -l walltime=30:00
#PBS -l nodes=2:ppn=12
#PBS -N mb-parallel
#PBS -j oe
set -x
date
cd $PBS_O_WORKDIR
pbsdcp ./primates.nex $TMPDIR
pbsdcp ./primates.nxs $TMPDIR
cd $TMPDIR
module load mrbayes
/usr/bin/time mpiexec mb-parallel ./primates.nxs > mb-paralllel.log
pbsdcp -g '*' $PBS_O_WORKDIR
date

Further Reading

See Also

  • Paup
Supercomputer: 
Service: 
Fields of Science: 

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD generally scales well on OSC platforms and offers a variety of modelling techniques. NAMD is file-compatible with AMBER, CHARMM, and X-PLOR.

Availability & Restrictions

NAMD is available without restriction to all OSC users.

The following versions of NAMD are available:

Version Glenn Oakley
2.6 X  
2.7 X  
2.9   X

Usage

Set-up

To load the NAMD software on the Glenn system, use the following command:

module load namd-2.6-mpi

Using NAMD

NAMD is rarely executed interactively because preparation for simulations is typically performed with extraneous tools, such as, VMD.

Batch Usage

Sample batch scripts and input files are available here:

/nfs/10/srb/workshops/compchem/namd/

The simple batch script below demonstrates the important points. It requests 16 processors and 2 hours of walltime. If the job goes beyond 2 hours, the job would be terminated.

#PBS -N apoa1
#PBS -l nodes=2:ppn=8
#PBS -l walltime=2:00:00
#PBS -S /bin/bash
#PBS -j oe

module load namd-2.6-mpi
cd $PBS_O_WORKDIR
pbsdcp -p apoa1.namd apoa1.pdb apoa1.psf *.xplor $TMPDIR
cd $TMPDIR
run_namd apoa1.namd
pbsdcp -pg '*' $PBS_O_WORKDIR

Further Reading

Supercomputer: 
Service: 

NetCDF

NetCDF (Network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data.

Availability & Restrictions

NetCDF is available without restriction to all OSC users.

The following versions of NetCDF are available at OSC:

Version Glenn Oakley
3.6.2 X  
4.1.3   X

Usage

Set-up

To initalize either system prior to using NetCDF, run the following comand:

module load netcdf

To use the parallel implementation of netcdf, run the following command instead:

module load pnetcdf

Building With NetDCF

With the netcdf library loaded, the following environment variables will be available for use:

Variable Use
$NETCDF_CFLAGS Use during your compilation step for C programs.
$NETCDF_FFLAGS Use during your compilation step for Fortran programs.
$NETCDF_LIBS Use when linking your program to NetCDF.

Similarly, when the pnetcdf module is loaded, the following environment variables will be available:

VARIABLE USE
$PNETCDF_CFLAGS Use during your compilation step for C programs.
$PNETCDF_FFLAGS Use during your compilation step for Fortran programs.
$PNETCDF_LIBS Use when linking your program to NetCDF.

 

For example, to build the code myprog.c with the netcdf library you would use:

icc -c $NETCDF_CFLAGS myprog.c
icc -o myprog myprog.o $NETCDF_LIBS

Batch Usage

You must load the netcdf or pnetcdf module in your batch script before executing a program which is built with the netcdf library

#PBS -N AppNameJob
#PBS -l nodes=1:ppn=12

module load netcdf
cd $PBS_O_WORKDIR
cp foo.dat $TMPDIR
cd $TMPDIR

appname < foo.dat > foo.out

cp foo.out $PBS_O_WORKDIR

Further Reading

See Also

  • hdf5 software page
Service: 

Octave

Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab. It may also be used as a batch-oriented language.

Octave has extensive tools for solving common numerical linear algebra problems, finding the roots of nonlinear equations, integrating ordinary functions, manipulating polynomials, and integrating ordinary differential and differential-algebraic equations. It is easily extensible and customizable via user-defined functions written in Octave's own language, or using dynamically loaded modules written in C++, C, Fortran, or other languages.

Availability & Restrictions

Octave is available to all OSC users without restriction.

The following versions of Octave are available on OSC clusters:

Version Glenn Oakley
2.1.73 X  
2.9.12 X  
3.6.1   X
3.6.4   X

Usage

Set-up

To initialize either cluster for the use of Octave, run the following command:

module load octave

Using Octave

To run Octave, simply run the following command:

octave

Batch Usage

The following example batch script will an octave code file, mycode.o, via the batch processing system. The script requests one full node of cores on Oakley and 1 hour of walltime.

#PBS -N AppNameJob
#PBS -l nodes=1:ppn=12
#PBS -l walltime=01:00:00
#PBS -l software=appname

module load octave
cd $PBS_O_WORKDIR
cp mycode.o $TMPDIR
cd $TMPDIR

octave < mycode.o > data.out

cp data.out $PBS_O_WORKDIR

Further Reading

See Also

Supercomputer: 
Service: 

OpenACC

OpenACC is a standard for parallel programming on accelerators, such as Nvidia GPUs and Intel Phi. It consists primarily of a set of compiler directives for executing code on the accelerator, in C and Fortran. OpenACC is currently only supported by the PGI compilers installed on OSC systems.

Availability & Restrictions

OpenACC is available without restriction to all OSC users. It is supported by the PGI compilers.

Note: Some of the older compiler versions available on Glenn may not support OpenACC.

Usage

Set-up

OpenACC support is built into the compilers. There is no separate module to load.

Building With OpenACC

To build a program with OpenACC, use the compiler flag appropriate to your compiler. The correct libraries are included implicitly.

Compiler Family Flag
PGI -acc -ta=nvidia -Minfo=accel

Batch Usage

An OpenACC program will not run without an acelerator present. You need to ensure that your PBS resource request includes GPUs. For example, to run an OpenACC program on Oakley, your resource request should look something like this: #PBS -l nodes=1:ppn=12:gpus=2.

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

OpenFOAM

OpenFOAM is a suite of computational fluid dynamics applications. It contains myriad solvers, both compressible and incompressible, as well as many utilities and libraries.

Availability & Restrictions

OpenFOAM is available to all OSC users without restriction.

The following versions of OpenFOAM are available on OSC clusters:

Version Glenn Oakley
1.7.x X X
2.1.0   X

Usage

Set-up

To configure the Glenn cluster for the use of OpenFOAM 1.7.x for serial execution, use the following commands:

    module load gcc-4.4.5
    module switch mpi mvapich2-1.6-gnu
    module load openfoam-1.7.x

To configure the Oakley cluster for the use of OpenFOAM 1.7.x for serial execution, use the following commands:

    . /usr/local/OpenFOAM/OpenFOAM-1.7.x/etc/bashrc

To configure the Oakley cluster for the use of OpenFOAM 2.1.0 for serial execution, use the following commands:

    . /usr/local/OpenFOAM/OpenFOAM-2.1.0/etc/bashrc

Basic Structure for a OpenFOAM Case

The basic directory structure for a OpenFOAM case is shown as follows:

/home/yourusername/OpenFOAM_case
|-- 0
    |-- U
    |-- epsilon
    |-- k
    |-- p
    `-- nut 
|-- constant
    |-- RASProperties
    |-- polyMesh
    |   |-- blockMeshDict
    |   `-- boundary
    |-- transportProperties
    `-- turbulenceProperties
|-- system
    |-- controlDict
    |-- fvSchemes
    |-- fvSolution
    `-- snappyHexMeshDict

IMPORTANT: To run in parallel, you need to also create the "decomposeParDict" file in the system directory. If you do not create this file, the decomposePar command will fail.

Using OpenFOAM

Once your environment has been configured with the above commands, you can then start any of the OpenFOAM utilities by the associated command:

blockMesh

or

icoFoam

or

sonicFoam

 

Batch Usage

OpenFOAM can be run in both serial and parallel.

Sample Batch Script (serial execution)

To run OpenFOAM in serial, the following example batch script will provide you with a template.

#PBS -N serial_OpenFOAM 
#PBS -l nodes=1:ppn=1 
#PBS -l walltime=24:00:00 
#PBS -j oe 
#PBS -S /bin/bash 

#Initialize OpenFOAM on Glenn Cluster
module load gcc-4.4.5
module switch mpi mvapich2-1.6-gnu
module load openfoam-1.7.x

#Move to the case directory, where the 0, constant and system directories reside
cd $PBS_O_WORKDIR

# Copy files to $TMPDIR and move there to execute the program
cp * $TMPDIR
cd $TMPDIR

#Mesh the geometry 
blockMesh

#Run the solver 
icoFoam

#Finally, copy files back to your home directory
cp * $PBS_O_WORKDIR

IMPORTANT: This template is for running OpenFOAM on Glenn cluster. If you would like to run OpenFOAM on Oakley cluster, you should modify the following part based on the "Set-up" part:

    #Initialize OpenFOAM on Glenn Cluster
    module load gcc-4.4.5
    module switch mpi mvapich2-1.6-gnu
    module load openfoam-1.7.x

Sample Batch Script (parallel execution)

An example of a parallel batch script follows, which will run the model on 16 processors (two nodes, where each node has 8 processors):

#PBS -N parallel_OpenFOAM
#PBS -l nodes=2:ppn=8
#PBS -l walltime=6:00:00
#PBS -j oe
#PBS -S /bin/bash 

#Initialize OpenFOAM on Glenn Cluster
module load gcc-4.4.5
module switch mpi mvapich2-1.6-gnu
module load openfoam-1.7.x

#Move to the case directory, where the 0, constant and system directories reside
cd $PBS_O_WORKDIR

#Mesh the geometry
blockMesh

#Decompose the mesh for parallel run
decomposePar

#Run the solver
mpirun -np 16 simpleFoam -parallel 

#Reconstruct the parallel results
reconstructPar

IMPORTANT: This template is for running OpenFOAM on Glenn cluster. If you would like to run OpenFOAM on Oakley cluster, you should do the following modifications:

  • Modify the following part based on the "Set-up" part:
    #Initialize OpenFOAM on Glenn Cluster
    module load gcc-4.4.5
    module switch mpi mvapich2-1.6-gnu
    module load openfoam-1.7.x
  • Use "mpiexec" instead of "mpirun" as:
    
    mpiexec simpleFoam -parallel
 

Further Reading

See Also

  • ParaView
Supercomputer: 
Service: 
Fields of Science: 

OpenMP

OpenMP is a standard for parallel programming on shared-memory systems, including multicore systems. It consists primarily of a set of compiler directives for sharing work among multiple threads. OpenMP is supported by all the Fortran, C, and C++ compilers installed on OSC systems.

Availability & Restrictions

OpenMP is available without restriction to all OSC users. It is supported by the Intel, PGI, and gnu compilers.

Note: Some of the older compiler versions available on Glenn may not support OpenMP.

Usage

Set-up

OpenMP support is built into the compilers. There is no separate module to load.

Building With OpenMP

To build a program with OpenMP, use the compiler flag appropriate to your compiler. The correct libraries are included implicitly.

Compiler Family Flag
Intel -openmp
gnu -fopenmp
PGI -mp

Batch Usage

An OpenMP program by default will use a number of threads equal to the number of processor cores available. To use a different number of threads, set the environment variable OMP_NUM_THREADS.

Further Reading

See Also

Service: 
Fields of Science: 

PAML

"PAML (for Phylogentic Analysis by Maximum Likelihood) contains a few programs for model fitting and phylogenetic tree reconstruction using nucleotide or amino-acid sequence data." (doc/pamlDOC.pdf)

Availability & Restrictions

PAML is available to all OSC users without restriction.

The following versions of PAML are available on OSC systems:

Version Glenn Oakley
4.4d X  

Usage

Set-up

On the Glenn Cluster paml is accessed by executing the following commands:

module load biosoftw
module load paml

Using PAML

PAML is a collection of several programs that will be added to the users PATH: baseml, basemlg, chi2, codeml, ds, evolver, mcmctree, pamp, and yn00.  Each of the programs has separate, but typically similar usage and options.

Options

  • baseml / basemlg   Maximum likelihood analysis of nucleotide sequences using a faster discrete model / Implements the (continuous) gamma model of Yang (Intensive Computation)

Both baseml and basemlg require a baseml.ctl in the current directory with the following variables set: seqfile, outfile, treefile

The following are optional variable to set in baseml.ctl: noisy, verbose, runmode, model, Mgene, ndata, clock, fix_kappa, kappa, fix_alpha, alpha, Malpha, ncatG, fix_rho,nparK, nhomo, getSE, RateAncestor, Small_Diff, cleandata, icode, fix_blength, method

chi2   Calculates the x2 critical value and p value for conducting the likelihood ratio test

chi2 [p | INTEGER DOUBLE]

chi2                   prints x2 critical values at set significance levels until ‘q+ENTER’ is reached

chi2 p                 interactive set the degrees of freedom and x2 value

chi2 INTEGER DOUBLE    Computes the probability for INTEGER df and DOUBLE x2

 

  • codeml   Implements the codon substitution model of Goldman & Yang for DNA and amino acid sequences

codeml requires codeml.ctl to be located in the current directory with the following variables set: seqfile, outfile, treefile, aaRatefile

The following are optional variables to set in codeml.ctl: noisy, verbose, runmode, seqtype, CodonFreq, ndata, aaDist, model, NSsites, icode, Mgene, fix_kappa, kappa, fix_omega, omega, fix_alpha, alpha, Malpha, ncatG, getSE, RateAncestor, Small_Diff, cleandata, fix_blength, method

  • ds   Computes descriptive statistics from a baseml/basemlg analysis

ds filename.type

  • evolver   Simulates sequences under nucleotide, codon, and amino acid substitution models; generates random trees; and calculates the partition distances between trees

EVOLVER in paml version 4.4d, March 2011

Results for options 1-4 & 8 go into evolver.out

Options

      (1) Get random UNROOTED trees?

      (2) Get random ROOTED trees?

      (3) List all UNROOTED trees?

      (4) List all ROOTED trees?

      (5) Simulate nucleotide data sets (use MCbase.dat)?

      (6) Simulate codon data sets      (use MCcodon.dat)?

      (7) Simulate amino acid data sets (use MCaa.dat)?

      (8) Calculate identical bi-partitions between trees?

      (9) Calculate clade support values (read 2 treefiles)?

      (11) Label clades?

      (0) Quit?


evolver’s option 5 requires MCbase.dat.  evolver’s option 6 requires MCcodon.dat.  evolver’s option 7 requires MCaa.dat and dat/mtmam.dat.  evolver’s option 9 requires truetree rst1 (formed from stewart.trees & codeml's output rst1).  evolver’s option 11 requires name.tress with user input.

  • mcmctree   Implements the Bayesian MCMC algorithm of Yang and Rannala for estimating species divergence times

mcmctree requires mcmctree.ctl to be located in the current directory with the following variables set: seqfile, treefile, outfile, RootAge, usedata

The following are optional variables to set in mcmctree.ctl: seed, ndata, clock, model, alpha, ncatG, cleandata, BDparas, kappa_gamma, alpha_gamma, rgene_gamma, sigma2_gamma, finetune, print, burnin, sampfreq, nsample

  • pamp   Implements the parsimony-based analysis of Yang and Kumar

pamp requires pamp.ctl to be located in the current directory with the following variables set: seqfile, treefile, outfile

The following are optional variables to set in pamp.ctl: seqtype, ncatG, nhomo

  • yn00   Implements the method of Yang and Nielson for estimating synonymous and nonsynonymous substitution rates in pairwise comparisons of protein-coding DNA sequences

yn00 requires yn00.ctl to be located in the current directory with the following variables set: seqfile, outfile

The following are optional variables to set in yn00.ctl: verbose, icode, weighting, commonf3x4, ndata

Control Files

All .ctl files (baseml.ctl, codeml.ctl, mcmctree.ctl, pamp.ctl, and yn00.ctl) have comment line starting with '*'.

Batch Usage

#PBS -N paml_test
#PBS -l walltime=0:05:00
#PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR
module load biosoftw
module load paml
export PAML_DIR=/usr/local/biosoftw/paml44
cp $PAML_DIR/*.* .
cp -r $PAML_DIR/dat .
cp -r $PAML_DIR/examples .
baseml
chi2 1 3.84
codeml
ds in.baseml
echo -e "1\n5\n5 5\n0\n2\n5\n5 5\n0\n3\n5\n4\n5\n5\n6\n7\n8\n" | evolver"
mcmctree
pamp
yn00

Further Reading

  • Four pdf documents are located in the following folder on Glenn:  /usr/local/biosoftw/paml44/doc/
  • An online discussion group for users is paml is located at the following website: http://www.rannala.org/phpBB2/
Supercomputer: 
Service: 

ParaView

ParaView is an open-source, multi-platform application designed to visualize data sets of size varying from small to very large. ParaView was developed to support distributed computational models for processing large data sets and to create an open, flexible user interface.

Availability & Restrictions

ParaView is available on Glenn cluster only.  The following versions are available:

VERSION GLENN OAKLEY
3.3.0 X  
3.8.0 X  
3.14.1 X  

 

Usage 

use module avail paraview to view available ParaView modules. ParaView is normally started with a module load command along with the specific version. For example, to load ParaView version 3.8.0, type:

module load paraview-3.8.0

Following a successful loading of the ParaView module, you can access the ParaView program:

paraview

Documentation

ParaView documentation is available on-line at http://www.paraview.org/New/help.html.

Supercomputer: 
Service: 
Fields of Science: 

PAUP

PAUP is a leading program for performing phylogenetic analysis for bioinformatics sequences. PAUP currently runs as a single processor program. No further enhancements are suggested.

Availability & Restrictions

PAUP is available to all OSC users without restriction.

The following versions of PAUP are available on OSC systems:

Version Glenn Oakley
4b10 X X

Usage

Set-up

To configure your environment for using PAUP, run the following command:

module load paup

Using PAUP

After loading the PAUP module, PAUP can be run with commands similar to the following:

paup nexus_file > nexus_file.out

Batch Usage

PAUP is best run via the batch processing system. The following example batch script file will use the input file nexus_file and the output file nexus_file.out

#PBS -l walltime=10:00:00
#PBS -l nodes=1:ppn=1
#PBS -N paup
#PBS -j oe

cd $PBS_O_WORKDIR
cp ./nexus_file $TMPDIR
cd $TMPDIR

module load paup
paup nexus_file > nexus_file.out
cp * $PBS_O_WORKDIR

Further Reading

  • The PAUP home page
  • PDF forms of the documentation are located at /usr/local/paup/paup4b10-opt-linux-a/Docs/ on Glenn and /usr/local/paup/4b10/Docs/ on Oakley
    • Cmd_ref_v2.pdf – command reference manual
    • Quick_start_v1.pdf – quick start guide for the command line version

See Also

  • Mrbayes
Supercomputer: 
Service: 
Fields of Science: 

PGI Compilers

Fortran, C and C++ compilers from the Portland Group. PGI compilers are the default, recommended compilers on Glenn.

Availability & Restrictions

The PGI compilers are available to all OSC users without restriction.

The following versions of the PGI compilers are available on OSC systems:

Version Glenn Oakley
7.0 X  
7.1 X  
8.0 X  
9.0 X*  
10.0 X  
10.5 X  
11.6 X  
11.8   X
12.5 X X
12.6 X X
12.9   X
12.10 X X*

* - default version

Usage

Set-up

To configure your environment for use of the PGI compilers, follow the system-specific instructions below:

For the Oakley system, run the following command (you may have to unload your selected compiler - if an error message appears, it will provide instructions):

module load pgi

For the Glenn system, run the following commands:

module load pgi

Using the PGI Compilers

Once the module is loaded, compiling with the PGI compilers requires understanding which binary should be used for which type of code. Specifically, use the pgcc binary for C codes, the pgCC binary for C++ codes, the pgf77 for FORTRAN-77 codes and the pgf90 for FORTRAN-90 codes.

Building With the PGI Compilers

The PGI compilers recognize the following command line options (this list is not exhaustive, for more information run man <compiler binary name>):

Compiler option Purpose
-c Compile into object code only; do not link
-DMACRO[=value] Defines preprocessor macro MACRO with optional value (default value is 1)
-g Enables debugging; disables optimization
-I/directory/name Add /directory/name to the list of directories to be searched for #include files
-L/directory/name Adds /directory/name to the list of directories to be searched for library files
-lname Adds the library libname.a or libname.so to the list of libraries to be linked
-o outfile Names the resulting executable outfile instead of a.out
-UMACRO Removes definition of MACRO from preprocessor
-O0 Disable optimization; default if -g is specified
-O1 Light optimization; default if -g is not specified
-O or -O2 Heavy optimization
-O3 Aggressive optimization; may change numerical results
-Mipa Inline function expansion for calls to procedures defined in separate files; implies -O2
-Munroll Loop unrolling; implies -O2
-Mconcur Automatic parallelization; implies -O2
-mp Enables translation of OpenMP directives
 

Further Reading

See Also

Supercomputer: 
Service: 
Technologies: 
Fields of Science: 

Python

Python is a high-level, multi-paradigm programming language that is both easy to learn and useful in a wide variety of applications.  Python has a large standard library as well as a large number of third-party extensions, most of which are completely free and open source. We highly recommend using the module to access Python, as we have added a lot of Python modules and tuned them to perform well on our systems.

Availability

Python is available on the Oakley and Glenn clusters.

Version Glenn Oakley
2.4.3 X*  
2.6.6   X*
2.7.1 X X

 

* - Default version

Usage

To run the Python interpreter using the default system-level Python installation, simply type the command “python”.  On Glenn, the default version used is 2.4.3.  On Oakley, the  default version used is 2.6.6.
 
Each system also has a Python 2.7.1 module.  To load this module, type one of the following commands depending on which system you are using.
 
On Glenn, type:
 
module load python-2.7.1
 
On Oakley, type:
 
module load python/2.7.1
 
After the module is loaded, you can run the interpreter by using the command
 
python
 

Installed modules

We have installed a number of Python modules and tuned them for optimal performance on our systems. Executing module help python will show you the current list.
 
setuptools, readline, bitarray, cloud, configobj, coverage, docutils, enstaller, epydoc, grin, html5lib, jinja2, nose, paramiko, ply, Crypto, pygarrayimage, pyglet, pygments, OpenSSL, pyparsing, pyproj, serial, dateutil, pytz, sphinx, sqlalchemy, xlrd, xlwt, scons, ldap, _mysql, numpy, scipy, pexpect, zope.interface, twisted, foolscap, wx, sip, PyQt4, matplotlib, zmq, IPython, PIL, reportlab, cython, numexpr, tables, argparse
 
Of special note is the fact that numpy and scipy have been compiled to use the optimized math libraries for each system (ACML on Glenn, MKL on Oakley) for best performance.
 
Due to architecture differences between our supercomputers, we recommend not installing your own packages in ~/.local. Instead, you should install them in some other directory and set $PYTHONPATH in your default environment. For more information about installing your own Python modules, please see our HOWTO.

Running Python Scripts in Batch

Python can technically be used in both interactive and non-interactive batch jobs, but you will most likely only want to run a Python script non-interactively.   To do this, you will need to write a batch submission script and a Python script.   
 
Below is a sample batch submission script and a simple “Hello World” program written in Python.
 
Example batch submission script, sample.job
 
#PBS –N hello
#PBS –l nodes=1:ppn=1
#PBS –l walltime=1:00:00
#PBS –j oe

module load python/2.7.1

python hello.py

#end of sample.job
 
Example Python script, hello.py:
 
import sys

hello = [“Hello”, “World”]

for i in hello:
    print i

sys.exit(0)

#end of hello.py
 
For more information about using the batch system, see Batch Processing at OSC.

See Also

HOWTO: Install your own python modules

Further reading

Extensive documentation of the Python programming language and software downloads can be found at the official Python website.  

 

Supercomputer: 
Service: 
Fields of Science: 

R

R is a language and environment for statistical computing and graphics. It is similar to the S language and environment developed at Bell Laboratories (formerly AT&T, now Lucent Technologies). R provides a wide variety of statistical and graphical techniques, and is highly extensible.

Availability & Restrictions

R is available to all OSC users without restriction.

The following versions of R are available on OSC systems: 

Version Glenn Oakley
2.5.1 X  
2.8.0 X  
2.11.1 X  
2.14.1   X
2.15.0 X X
2.15.2 X X
3.0.1 X X

Usage

Set-up

In order to configure your environment for the usage of R, run the following command:

module load R

Using R

Once your environment is configured, R can be started simply by entering the following (rediculously short) command:

R

For a listing of command line options, run:

R --help

Batch Usage

Running R interactivly is not recommended and may violate OSC usage policy. In order to run R in batch, refrence the example batch script below. This script requests one full node on the Oakley cluster for 1 hour of walltime.

#PBS -N R_ExampleJob
#PBS -l nodes=1:ppn=12

module load R
cd $PBS_O_WORKDIR
cp in.dat $TMPDIR
cd $TMPDIR

R CMD BATCH in.dat out.dat

cp out.dat $PBS_O_WORKDIR

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

RAxML

"RAxML is a fast implementation of maximum-likelihood (ML) phylogeny estimation that operates on both nucleotide and protein sequence alignments."
(http://www.embl.de/~seqanal/courses/commonCourseContent/usingRaxml.html)

Availability & Restrictions

RAxML is available to all OSC users without restriction.

The following versions of RAxML are available on OSC systems:

Version Glenn Oakley
7.0.4 X  
7.4.2 X X

Usage

Set-up

On the Glenn Cluster RAxML is accessed by executing the following commands:

module load biosoftw
module load RAxML

Using RAxML

raxmlHPC[-MPI|-PTHREADS] -s sequenceFileName -n outputFileName -m substitutionModel [-a weightFileName] [-b bootstrapRandomNumberSeed] [-c numberOfCategories][-d] [-e likelihoodEpsilon] [-E excludeFileName] [-f a|b|c|d|e|g|h|i|j|m|n|o|p|s|t|w|x] [-g groupingFileName] [-h] [-i initialRearrangementSetting] [-j] [-k] [-l sequenceSimilarityThreshold] [-L sequenceSimilarityThreshold] [-M] [-o outGroupName1[,outGroupName2[,...]]] [-p parsimonyRandomSeed] [-P proteinModel] [-q multipleModelFileName] [-r binaryConstraintTree] [-t userStartingTree] [-T numberOfThreads] [-u multiBootstrapSearches] [-v][-w workingDirectory] [-x rapidBootstrapRandomNumberSeed][-y][-z multipleTreesFile] [-#|-N numberOfRuns]

Options

-a    Specify a column weight file name to assign individual weights to each column of the alignment. Those weights must be integers separated by any type and number of whitespaces within a separate file, see file "example_weights" for an example.
-b    Specify an integer number (random seed) and turn on bootstrapping.  DEFAULT: OFF
-c    Specify number of distinct rate catgories for RAxML when modelOfEvolution is set to GTRCAT or GTRMIX.  Individual per-site rates are categorized into numberOfCategories rate categories to accelerate computations.  DEFAULT: 25
-d    start ML optimization from random starting tree.  DEFAULT: OFF
-e    set model optimization precision in log likelihood units for final optimization of tree topology under MIX/MIXI or GAMMA/GAMMAI.  DEFAULT: 0.1   for models not using proportion of invariant sites estimate 0.001 for models using proportion of invariant sites estimate
-E    specify an exclude file name, that contains a specification of alignment positions you wish to exclude.  Format is similar to Nexus, the file shall contain entries like "100-200 300-400", to exclude a single column write, e.g., "100-100", if you use a mixed model, an appropriately adapted model file will be written.
-f    select algorithm:
      "-f a": rapid Bootstrap analysis and search for best-scoring ML tree in one program run
      "-f b": draw bipartition information on a tree provided with "-t" based on multiple trees (e.g. form a bootstrap) in a file specifed by "-z"
      "-f c": check if the alignment can be properly read by RAxML
      "-f d": new rapid hill-climbing
      "-f e": optimize model+branch lengths for given input tree under GAMMA/GAMMAI only
      "-f g": compute per site log Likelihoods for one ore more trees passed via "-z" and write them to a file that can be read by CONSEL
      "-f h": compute log likelihood test (SH-test) between best tree passed via "-t" and a bunch of other trees passed via "-z"
      "-f i": perform a really thorough bootstrap, refinement of final BS tree under GAMMA and a more exhaustive algorithm
      "-f j": generate a bunch of bootstrapped alignment files from an original alignment file
      "-f m": Compare bipartitions between two bunches of trees passed via "-t" and "-z" respectively. This will return the Pearson correlation between all bipartitions found in the two tree files. A file called RAxML_bipartitionFrequencies.outpuFileName will be printed that contains the pair-wise bipartition frequencies of the two sets
      "-f n": Compute the log likelihood score of all trees contained in a tree file provided by "-z" under GAMMA or GAMMA+P-Invar
      "-f o": old and slower rapid hill-climbing
      "-f p": perform pure stepwise MP addition of new sequences to an incomplete starting tree
      "-f s": split up a multi-gene partitioned alignment into the respective subalignments
      "-f t": do randomized tree searches on one fixed starting tree
      "-f w": compute ELW test on a bunch of trees passed via "-z"
      "-f x": compute pair-wise ML distances, ML model parameters will be estimated on an MP starting tree or a user-defined tree passed via "-t", only allowed for GAMMA-based models of rate heterogeneity.  DEFAULT: new rapid hill climbing
-g    specify the file name of a multifurcating constraint tree this tree does not need to be comprehensive, i.e. must not contain all taxa
-h    Display this help message.
-i    Initial rearrangement setting for the subsequent application of topological changes phase.  DEFAULT: determined by program
-j    Specifies if checkpoints will be written by the program. If checkpoints (intermediate tree topologies) shall be written by the program specify "-j"  DEFAULT: OFF
-k    Specifies that bootstrapped trees should be printed with branch lengths.  The bootstraps will run a bit longer, because model parameters will be optimized at the end of each run. Use with CATMIX/PROTMIX or GAMMA/GAMMAI.  DEFAULT: OFF
-l    Specify a threshold for sequence similarity clustering. RAxML will then print out an alignment to a file called sequenceFileName.reducedBy.threshold that only contains sequences <= the specified thresold that must be between  0.0 and 1.0.  RAxML uses the QT-clustering algorithm to perform this task. In addition, a file called RAxML_reducedList.outputFileName will be written that contains clustering information.  DEFAULT: OFF
-L    Same functionality as "-l" above, but uses a less exhasutive and thus faster clustering algorithm.  This is intended for very large datasets with more than 20,000-30,000 sequences.  DEFAULT: OFF
-m    Model of Nucleotide or Amino Acid Substitution:
NUCLEOTIDES:
    "-m GTRCAT"                      : GTR + Optimization of substitution rates + Optimization of site-specific evolutionary rates which are categorized into numberOfCategories distinct rate categories for greater computational efficiency if you do a multiple analysis with "-#" or "-N" but without bootstrapping the program will use GTRMIX instead
    "-m GTRGAMMA"                    : GTR + Optimization of substitution rates + GAMMA model of rate heterogeneity (alpha parameter will be estimated)
    "-m GTRMIX"                      : Inference of the tree under GTRCAT and thereafter evaluation of the final tree topology under GTRGAMMA
    "-m GTRCAT_GAMMA"                : Inference of the tree with site-specific evolutionary rates.  However, here rates are categorized using the 4 discrete GAMMA rates. Evaluation of the final tree topology under GTRGAMMA
    "-m GTRGAMMAI"                   : Same as GTRGAMMA, but with estimate of proportion of invariable sites
    "-m GTRMIXI"                     : Same as GTRMIX, but with estimate of proportion of invariable sites
    "-m GTRCAT_GAMMAI"               : Same as GTRCAT_GAMMA, but with estimate of proportion of invariable sites
AMINO ACIDS:
    "-m PROTCATmatrixName[F]"        : specified AA matrix + Optimization of substitution rates + Optimization of site-specific evolutionary rates which are categorized into numberOfCategories distinct rate categories for greater computational efficiency if you do a multiple analysis with  "-#" or "-N" but without bootstrapping the program will use PROTMIX... instead
    "-m PROTGAMMAmatrixName[F]"      : specified AA matrix + Optimization of substitution rates + GAMMA model of rate heterogeneity (alpha parameter will be estimated)
    "-m PROTMIXmatrixName[F]"        : Inference of the tree under specified AA matrix + CAT and thereafter evaluation of the final tree topology under specified AA matrix + GAMMA
    "-m PROTCAT_GAMMAmatrixName[F]"  : Inference of the tree under specified AA matrix and site-specific evolutionary rates.  However, here rates are categorized using the 4 discrete GAMMA rates. Evaluation of the final tree topology under specified AA matrix + GAMMA
    "-m PROTGAMMAImatrixName[F]"     : Same as PROTGAMMAmatrixName[F], but with estimate of proportion of invariable sites
    "-m PROTMIXImatrixName[F]"       : Same as PROTMIXmatrixName[F], but with estimate of proportion of invariable sites
    "-m PROTCAT_GAMMAImatrixName[F]" : Same as PROTCAT_GAMMAmatrixName[F], but with estimate of proportion of invariable sites
Available AA substitution models: DAYHOFF, DCMUT, JTT, MTREV, WAG, RTREV, CPREV, VT, BLOSUM62, MTMAM, GTR
With the optional "F" appendix you can specify if you want to use empirical base frequencies.  Please not that for mixed models you can in addition specify the per-gene AA model in the mixed model file (see manual for details)
-M    Switch on estimation of individual per-partition branch lengths. Only has effect when used in combination with "-q". Branch lengths for individual partitions will be printed to separate files.  A weighted average of the branch lengths is computed by using the respective partition lengths.  DEFAULT: OFF
-n    Specifies the name of the output file.
-o    Specify the name of a single outgrpoup or a comma-separated list of outgroups, eg "-o Rat" or "-o Rat,Mouse", in case that multiple outgroups are not monophyletic the first name in the list will be selected as outgroup, don't leave spaces between taxon names!
-q    Specify the file name which contains the assignment of models to alignment partitions for multiple models of substitution. For the syntax of this file please consult the manual.
-p    Specify a random number seed for the parsimony inferences. This allows you to reproduce your results and will help me debug the program. This option HAS NO EFFECT in the parallel MPI version
-P    Specify the file name of a user-defined AA (Protein) substitution model. This file must contain 420 entries, the first 400 being the AA substitution rates (this must be a symmetric matrix) and the last 20 are the empirical base frequencies
-r    Specify the file name of a binary constraint tree.  This tree does not need to be comprehensive, i.e. must not contain all taxa
-s    Specify the name of the alignment data file in PHYLIP format
-t    Specify a user starting tree file name in Newick format
-T    PTHREADS VERSION ONLY! Specify the number of threads you want to run.  Make sure to set "-T" to at most the number of CPUs you have on your machine, otherwise, there will be a huge performance decrease!
-u    Specify the number of multiple BS searches per replicate to obtain better ML trees for each replicate.  DEFAULT: One ML search per BS replicate
-v    Display version information
-w    Name of the working directory where RAxML will write its output files.  DEFAULT: current directory
-x    Specify an integer number (random seed) and turn on rapid bootstrapping
-y    If you want to only compute a parsimony starting tree with RAxML specify "-y", the program will exit after computation of the starting tree.  DEFAULT: OFF
-z    Specify the file name of a file containing multiple trees e.g. from a bootstrap that shall be used to draw bipartition values onto a tree provided with "-t", It can also be used to compute per site log likelihoods in combination with "-f g" and to read a bunch of trees for a couple of other options ("-f h", "-f m", "-f n").
-#|-N Specify the number of alternative runs on distinct starting trees.  In combination with the "-b" option, this will invoke a multiple bootstrap analysis.  Note that "-N" has been added as an alternative since "-#" sometimes caused problems with certain MPI job submission systems, since "-#" is often used to start comments.  DEFAULT: 1 single analysis

Example

The file brown.philip is from the PAML example and modified to be in PHYLIP format.

#PBS -N RAxML_test
#PBS -l walltime=10:00
#PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR
module load biosoftw
module load RAxML
raxmlHPC-PTHREAD -T 4 -m GTRCAT -n test -s brown.phylip

If you want to compute more than one tree add the following line before calling the program:

module load pvfs2

And substitute raxmlHPC-MPI for raxmlHPC-PTHREAD.

Further Reading

Supercomputer: 
Service: 
Fields of Science: 

RECON

“Proper identification of repetitive sequences is an essential step in genome analysis. The RECON package performs de novo identification and classification of repeat sequence families from genomic sequences. The underlying algorithm is based on extensions to the usual approach of single linkage clustering of local pairwise alignments between genomic sequences. Specifically, our extensions use multiple alignment information to define the boundaries of individual copies of the repeats and to distinguish homologous but distinct repeat element families. RECON should be useful for first-pass automatic classification of repeats in newly sequenced genomes.” (http://selab.janelia.org/recon.html)

Availability & Restrictions

RECON is available to all OSC users without restriction.

The following versions of RECON are available on OSC systems:

Version Glenn Oakley
1.0.5 X  

Usage

Set-up

On the Glenn Cluster, RECON is accessed by executing the following commands:

module load biosoftw
module load RECON

RECON is a collection of several programs that will be added to the users PATH: imagespread, eledef, eleredef, edgeredef, and famdef.  Also added to the users PATH is recon.pl, which executes each of the aforementioned programs.

Using RAxML

raxmlHPC[-MPI|-PTHREADS] -s sequenceFileName -n outputFileName -m substitutionModel [-a weightFileName] [-b bootstrapRandomNumberSeed] [-c numberOfCategories][-d] [-e likelihoodEpsilon] [-E excludeFileName] [-f a|b|c|d|e|g|h|i|j|m|n|o|p|s|t|w|x] [-g groupingFileName] [-h] [-i initialRearrangementSetting] [-j] [-k] [-l sequenceSimilarityThreshold] [-L sequenceSimilarityThreshold] [-M] [-o outGroupName1[,outGroupName2[,...]]] [-p parsimonyRandomSeed] [-P proteinModel] [-q multipleModelFileName] [-r binaryConstraintTree] [-t userStartingTree] [-T numberOfThreads] [-u multiBootstrapSearches] [-v][-w workingDirectory] [-x rapidBootstrapRandomNumberSeed][-y][-z multipleTreesFile] [-#|-N numberOfRuns]

Example

Not currently available.

Further Reading

Supercomputer: 
Service: 
Fields of Science: 

RepeatMasker

"RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program." (http://www.repeatmasker.org/)

Availability & Restrictions

RepeatMasker is available to all OSC users without restriction.

The following versions of RepeatMasker are available on OSC systems:

Version Glenn Oakley
2.1 X  

Usage

Set-up

On the Glenn Cluster RepeatMasker is accessed by executing the following commands:

module load biosoftw
module load RepeatMasker

RepeatMasker will be added to the users PATH and can be run with the command:

RepeatMasker [-options] <seqfiles(s) in fasta format>

Options

-h(elp)
      Detailed help
      Default settings are for masking all type of repeats in a primate sequence.
-pa(rallel) [number]
      The number of processors to use in parallel (only works for batch files or sequences over 50 kb)
-s    Slow search; 0-5% more sensitive, 2-3 times slower than default
-q    Quick search; 5-10% less sensitive, 2-5 times faster than default
-qq   Rush job; about 10% less sensitive, 4->10 times faster than default (quick searches are fine under most circumstances) repeat options
-nolow /-low
      Does not mask low_complexity DNA or simple repeats
-noint /-int
      Only masks low complex/simple repeats (no interspersed repeats)
-norna
      Does not mask small RNA (pseudo) genes
-alu
      Only masks Alus (and 7SLRNA, SVA and LTR5)(only for primate DNA)
-div [number]
      Masks only those repeats < x percent diverged from consensus seq
-lib [filename]
      Allows use of a custom library (e.g. from another species)
-cutoff [number]
      Sets cutoff score for masking repeats when using -lib (default 225)
-species <query species>
      Specify the species or clade of the input sequence. The species name must be a valid NCBI Taxonomy Database species name and be contained in the RepeatMasker repeat database. Some examples are:
      -species human
      -species mouse
      -species rattus
      -species "ciona savignyi"
      -species arabidopsis
      Other commonly used species: mammal, carnivore, rodentia, rat, cow, pig, cat, dog, chicken, fugu, danio, "ciona intestinalis" drosophila, anopheles, elegans, diatoaea, artiodactyl, arabidopsis, rice, wheat, and maize

Contamination options

-is_only
      Only clips E coli insertion elements out of fasta and .qual files
-is_clip
      Clips IS elements before analysis (default: IS only reported)
-no_is
      Skips bacterial insertion element check
-rodspec
      Only checks for rodent specific repeats (no repeatmasker run)
-primspec
      Only checks for primate specific repeats (no repeatmasker run)

Running options

-gc [number]
      Use matrices calculated for 'number' percentage background GC level
-gccalc
      RepeatMasker calculates the GC content even for batch files/small seqs
-frag [number]
      Maximum sequence length masked without fragmenting (default 40000, 300000 for DeCypher)
-maxsize [nr]
      Maximum length for which IS- or repeat clipped sequences can be produced (default 4000000). Memory requirements go up with higher maxsize.
-nocut
      Skips the steps in which repeats are excised
-noisy
      Prints search engine progress report to screen (defaults to .stderr file)
-nopost
      Do not postprocess the results of the run ( i.e. call ProcessRepeats).
       NOTE: This options should only be used when ProcessRepeats will be run manually on the results.

Output options

-dir [directory name]
      Writes output to this directory (default is query file directory, "-dir ." will write to current directory).
-a(lignments)
      Writes alignments in .align output file; (not working with -wublast)
-inv
      Alignments are presented in the orientation of the repeat (with option -a)
-lcambig
      Outputs ambiguous DNA transposon fragments using a lower case name.  All other repeats are listed in upper case. Ambiguous fragments match multiple repeat elements and can only be called based on flanking repeat information.
-small
      Returns complete .masked sequence in lower case
-xsmall
      Returns repetitive regions in lowercase (rest capitals) rather than masked
-x    Returns repetitive regions masked with Xs rather than Ns
-poly
      Reports simple repeats that may be polymorphic (in file.poly)
-source
      Includes for each annotation the HSP "evidence". Currently this option is only available with the "-html" output format listed below.
-html
      Creates an additional output file in xhtml format.
-ace
      Creates an additional output file in ACeDB format
-gff
      Creates an additional Gene Feature Finding format output
-u    Creates an additional annotation file not processed by ProcessRepeats
-xm   Creates an additional output file in cross_match format (for parsing)
-fixed
      Creates an (old style) annotation file with fixed width columns
-no_id
      Leaves out final column with unique ID for each element (was default)
-e(xcln)
      Calculates repeat densities (in .tbl) excluding runs of >=20 N/Xs in the query

Example

#PBS -N RepeatMasker_test
#PBS -l walltime=4:00:00
#PBS -l nodes=1:ppn=4

module load biosoftw
module load RepeatMasker
cp /usr/local/biosoftw/bowtie-0.12.7/genomes/NC_008253.fna .
RepeatMasker –pa 4 NC_008253.fna

Errors

The following commands result in errors:  RepeatMasker -w, RepeatMasker -de, RepeatMasker -e.

Further Reading

Supercomputer: 
Service: 
Fields of Science: 

ScaLAPACK

ScaLAPACK is a library of high-performance linear algebra routines for clusters supporting MPI. It contains routines for solving systems of linear equations, least squares problems, and eigenvalue problems.

This page documents usage of the ScaLAPACK library installed by OSC from source. An optimized implementation of ScaLAPACK is included in MKL; see the software documentation page for Intel Math Kernel Library for usage information.

Availability & Restrictions

ScaLAPACK is available to all OSC users without restriction.  If you need high performance, we recommend using MKL instead of the standalone scalapack module.

The following versions of ScaLAPACK are available on OSC systems:

Version Glenn Oakley
1.7 X  
2.0.1   X

Usage

Set-up

To use the ScaLAPACK libraries in your compilation on Oakley, first load the scalapack module:

module load scalapack

To use the ScaLAPACK libraries in your compilation on Glenn, you must load the acml module then the scalapack module. Both modules are compiler-dependent.

Compiler Commands
PGI
module load acml
module load scalapack
gnu
module load acml-gfortran
module load scalapack-gnu
Intel
module load acml-intel
module load scalapack-intel

Building With ScaLAPACK

Once loaded, the ScaLAPACK libraries can be linked in with your compilation. To do this, use the following environment variables:

On Oakley:

Variable Use
$SCALAPACK_LIBS Used to link ScaLAPACK into either Fortran or C

On Glenn:

Variable Use
$SCALAPACK_C_LIBS Used to link ScaLAPACK into a C program
$SCALAPACK_F77_LIBS Used to link ScaLAPACK into a Fortran program

Further Reading

See Also

 

Supercomputer: 
Service: 

SPRNG

Computational stochastic approaches (Monte Carlo methods) based on random sampling are becoming extremely important research tools not only in their "traditional" fields such as physics, chemistry or applied mathematics but also in social sciences and, recently, in various branches of industry. An indication of importance is, for example, the fact that Monte Carlo calculations consume about one half of the supercomputer cycles. One of the indispensable and important ingredients for reliable and statistically sound calculations is the source of pseudo random numbers. SPRNG provides a scalable package for parallel pseudo random number generation which will be easy to use on a variety of architectures, especially in large-scale parallel Monte Carlo applications.

SPRNG 1.0 provides the user the various SPRNG random number generators each in its own library. For most users this is acceptable, as one rarely uses more than one type of generator in a single program. However, if the user desires this added flexibility, SPRNG 2.0 provides it. In all other respects, SPRNG 1.0 and SPRNG 2.0 are identical.

Availability & Restrictions

SPRNG is available to all OSC users without restriction.

The following versions of sprng are available on OSC systems:

Version Glenn Oakley
1.0 X  
2.0 X X

Usage

Set-up

To configure your environment for use of SPRNG 1.0 on Glenn, use the following command:

module load sprng

To configure your environment for use of SPRNG 2.0 on Glenn, use the following command:

module load sprng2

To configure your environment for use of SPRNG 2.0 on Oakley, use the following command:

module load sprng/2.0

Note: SPRNG is dependent on the compiler and mpi module and may not be supported for all combinations.

Building With SPRNG 1.0 and 2.0

Once SPRNG 1.0 is loaded, the following environment variables will be defined for use with compilation tools:

Variable Function
$SPRNG_INCLUDE Include path for SPRNG header files
$SPRNG_CMRG Library path and library for CMRG generator
$SPRNG_LCG Library path and library for LCG generator
$SPRNG_LCG64 Library path and library for 64-bit LCG generator
$SPRNG_LFG Library path and library for LFG generator
$SPRNG_MLFG Library path and library for MLFG generator

Alternativly, when SPRNG 2.0 is loaded, the following variables will be defined:

VARIABLE Function
$SPRNG_INCLUDE Include path for SPRNG 2.0 header files
$SPRNG_LIBS Library path and library for SPRNG 2.0 generators

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: 

STAR-CCM+

STAR-CCM+ provides the world’s most comprehensive engineering physics simulation inside a single integrated package. Much more than a CFD code, STAR‑CCM+ provides an engineering process for solving problems involving flow (of fluids and solids), heat transfer and stress. STAR‑CCM+ is unrivalled in its ability to tackle problems involving multi‑physics and complex geometries.  Support is provided by CD-adapco.

Availability & Restrictions

CD-adapco releases new version of STAR-CCM+ every four months. The following versions of STAR-CCM+ are available at OSC:

VERSION GLENN OAKLEY
6.04.016 X  
6.06.017 X  
7.02.011 X  
7.04.011 X X
7.06.012   X
7.06.012r8   X
8.02.011   X
8.04.009 X X

Feel free to contact OSC Help if you need other versions for your work.

Currently, there are in total 10 Power-Session license at OSC for academic users. One Power-Session license allows an approved user to run one STAR-CCM+ job on an unlimited number of cores (There are however hardware limits enforced to users following OSC policies). 

Usage

Access

Use of STAR-CCM+ for academic purposes requires validation. To obtain validation, OSU users complete form STAR-CCM+(OSU).pdf located in the Academic Agreement Forms and return it to OSC Help. Non-OSU academic users complete both forms STAR-CCM+ (non-OSU)_1st.pdf and STAR-CCM+ (non-OSU)_2nd.pdf located in the Academic Agreement Forms and return both to OSC Help

 

Set-up

Use module avail starccm to view available STAR-CCM+ modules for a given machine. STAR-CCM+ is normally started with a module load command along with the specific version. For example, to load STAR-CCM+ version 7.06.012 on Oakley, type:

module load starccm/7.06.012

Following a successful loading of the STAR-CCM+ module, you can access the STAR-CCM+ program:

starccm+

OSC Batch Usage

STAR-CCM+ can be run on OSC clusters in either interactive mode or in batch mode. Interactive mode is similar to running STAR-CCM+ on a desktop machine in that the graphical user interface (GUI) will be sent from OSC and displayed on the local machine. Batch mode means that you run the STAR-CCM+ job by submitting a batch script with providing all the STAR-CCM+ input file(s) needed to run the simulation when resources become available.

Interactive Example

To run interactive STAR-CCM+, it is suggested to request necessary compute resources from the login node, with X11 forwarding. The intention is that users can run STAR-CCM+ interactively for the purpose of building their model, preparing input file (.sim file), and checking results. For example, the following line requests one node, one core, for a walltime of one hour:

qsub -I -X -l walltime=1:00:00 -l nodes=1:ppn=1

This job will queue until resources become available. Once the job is started, you're automatically logged in on the compute node; and you can launch STAR-CCM+ GUI with the following commands:

module load starccm/7.06.012
starccm+

For academic users, check the “Power Session” under “ License” after you select either “New Simulation” or “Load Simulation” from “File” in STAR-CCM+ GUI as shown below:

starccm+ power 1  starccm+ power 2

Batch Mode

Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. For a given problem, prepare the input file with STAR-CCM+ (named starccm.sim for example) for the batch run. Below shows how to run STAR-CCM+ in batch mode when Power-Session license is used. The batch script should be modified accordingly if Power-On-Demand license is used. Feel free to contact OSC Help if you need more information. 
Serial execution when Power-Session license is used 

The following batch script would be needed for the serial applicaiton (assume the solution will be completed within 30 hours and 1 processor):

#PBS -N star-ccm_test  
#PBS -l walltime=30:00:00  
#PBS -l nodes=1:ppn=1  
#PBS -j oe
#PBS -S /bin/bash

cd $TMPDIR  
cp $PBS_O_WORKDIR/starccm.sim .  
module load starccm/7.06.012  
starccm+ -power -batch starccm.sim >&output.txt  
cp * $PBS_O_WORKDIR

To run this job on OSC batch system, the above script (named submit_starccm.job) is to be submitted with the command:

qsub submit_starccm.job
Parallel execution when Power-Session license is used 

To take advantage of the powerful compute resources at OSC, you may choose to run distributed STAR-CCM+ for large problems. Multiple nodes and cores can be requested to accelerate the solution time. The following shows an example script if you need 2 nodes with 12 cores per node on Oakley using the inputfile named starccm.sim:

#PBS -N starccm_test 
#PBS -l walltime=3:00:00 
#PBS -l nodes=2:ppn=12
#PBS -j oe
#PBS -S /bin/bash

cd $PBS_0_WORKDIR
cp starccm.sim $TMPDIR
cd $TMPDIR

module load starccm
starccm+ -power -np 24 -batch -machinefile $PBS_NODEFILE starccm.sim >&output.txt

cp * $PBS_0_WORKDIR

Information on how to monitor the job can be found in the computing environments section.

Supercomputer: 
Service: 
Fields of Science: 

Stata

Stata is a complete, integrated statistical package that provides everything needed for data analysis, data management, and graphics. Release 13 32-processor SMP is currently available at OSC.

Availability & Restrictions

Only academic use is allowed. Please contact oschelp@osc.edu to get validated for using Stata.

The following versions of Stata are available on OSC systems:

Version Glenn Oakley
13  

X

 

Stata is no longer available on Glenn. We have upgraded to Stata 13, and migrated to Oakley.

Usage

Set-up

To configure your environment on Oakley for the usage of Stata, run the following command:

module load stata

Using Stata

Due to licensing restrictions, Stata may ONLY be used via the batch system on Oakley. See below for information on how this is done.

Batch Usage

OSC has a 2-user license. However, there is no enforcement mechanism built into Stata. In order for us to stay within the 2-user limit, we require you to run in the context of PBS and to include this option when starting your batch job (the PBS system will enforce the 2 user limit):

#PBS -l software=stata

Non-Interactive batch example

Use the script below as a template for your usage.

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=12
#PBS -l software=stata
#PBS -N stata
#PBS -j oe

module load stata
cd $PBS_O_WORKDIR
stata-mp -b do bigjob

Interactive batch example

The following illustrates how to run Stata via an interactive batsh session. Please note that this requires an X11 server on the client system.

oakley01: qsub -X -I -l nodes=1:ppn=12 -l software=stata 
qsub: waiting for job 5860.oak-batch.osc.edu to start
qsub: job 5860.oak-batch.osc.edu ready

n0533: module load stata
n0533: xstata-mp
n0533: exit

Stata can take full advantage of our large 32 core, 1TB RAM node. Please see the Oakley documentation for information about how to access this node.

Further Reading

See Also

Supercomputer: 
Service: 

SuperLU

SuperLU is a library for the direct solution of large, sparse, nonsymmetric systems of linear equations on high performance machines.  It comes in two different flavors:  SuperLU_MT (multithreaded) for shared memory parallel machines and SuperLU_DIST for distributed memory parallel machines.

Availability & Restrictions

SuperLU is available to all OSC users without restriction.

The following versions of SuperLU are available on OSC systems:

Library Version Glenn Oakley
SuperLU_MT 2.0   X
SuperLU_MT 2.3 X  
SuperLU_DIST 2.3 X  
SuperLU_DIST 3.0   X
SuperLU_DIST 3.1   X

SuperLU is available for all compilers on Oakley but only for Intel compilers on Glenn.

Usage

Set-up

To use the SuperLU libraries in your compilation, first load the appropriate superlu module:

On Oakley, use one of these:

module load superlu_mt

module load superlu_dist

On Glenn, you must first load the acml module:

module load acml-intel

then use one of these:

module load superlu-MT-intel

module load superlu-DIST-intel

Building With SuperLU

Once loaded, the SuperLU libraries can be linked in with your compilation. To do this, use the following environment variables:

On Oakley:

Variable Use
$SUPERLU_MT_CFLAGS Include path for multithreaded libraries
$SUPERLU_MT_LIBS Link flags for multithreaded libraries
$SUPERLU_DIST_CFLAGS Include path for distributed-memory libraries
$SUPERLU_DIST_LIBS Link flags for distributed-memory libraries

On Glenn:

Variable Use
$SUPERLU_INC Include path
$SUPERLU_LIBS Link flags for either library, depending on loaded module

Further Reading

See Also

 

Supercomputer: 
Service: 

TotalView Debugger

Introduction

TotalView is a symbolic debugger which supports threads, MPI, OpenMP, C/C++ and Fortran, plus mixed-language codes.  Advanced features include on-demand memory leak detection, heap allocation debugging and the Standard Template Library Viewer (STLView).  Other features like dive, a wide variety of breakpoints, the Message Queue Graph/Visualizer, powerful data analysis and control at the thread level give you the power you need to solve tough problems.  

Versions

Version Glenn Oakley
8.3.0 X  
8.5.0-1 X  
8.8.0-2 X  
8.9.2-1   X*

* Both standard and CUDA versions are available.  

Usage

To use TotalView, load the module and launch using the following commands:

module load totalview
totalview

Example code to debug can be found on both Glenn and Oakley in $TOTALVIEW_HOME/linux-x86-64/examples.  Refer to the file README.TXT in this directory for instructions for using these examples.  

Further Reading

On Oakley, the TotalView User Guide in pdf format can be found in $TOTALVIEW_HOME/doc/pdf.  

Additional information about TotalView can be found at the Rogue Wave Software page.  

Supercomputer: 
Service: 
Fields of Science: 

TreeBeST

"TreeBeST is an original tree builder for constrained neighbour-joining and tree merge, an efficient tool capable of duplication/loss/ortholog inference, and a versatile program facili- tating many tree-building routines, such as tree rooting, alignment filtering and tree plot- ting. TreeBeST stands for ‘(gene) Tree Building guided by Species Tree’. It is previously known as NJTREE as the first piece of codes of this project aimed to build a eighbour-joining tree.

TreeBeST is the core engine of TreeFam (Tree Families Database) project initiated by Richard Durbin. The basic idea of this project is to build a full tree constrained by a manually verified seed tree. The tree builder must know how to utilize the prior knowledge provided by human experts. This demand disqualifies any existing softwares. Given this fact, we devised a new algorithm to control the joining step of traditional neighbour-joining. This is origin the constrained neighbour-joining.

When trees are built, they are only meaningful to biologists. Computers generate trees, but they do not understand them. To understand gene trees, a computer must be equipped with some biological knowledges, the species tree. It will teach a computer how to discriminate a speciation from a duplication event and how to find orthologs, provided a correct gene tree.

Unfortunately, gene trees are not always correct. Since the advent of UPGMA algorithm in 1958, we have tried to find a ideal model for nearly half a century. But we failed. Evolution is so complex a thing. A model best fits in one lineage might mean a disaster in another. A unified model is far from being discovered. TreeBeST aims at improving the accuracy of tree building, but it does not try to set up a new model in a traditional way. Instead, it integrates two existing models with the help of species tree, finding the subtree that best fits the models and merging them together to build a new tree incorporating the advantages of the both. This is the tree algorithm." (treebest.pdf)

Availability & Restrictions

TreeBeST is available to all OSC users without restriction.

The following versions of TreeBeST are available on OSC systems:

Version Glenn Oakley
1.9.2 X  

Usage

Set up

On the Glenn Cluster TreeBeST is accessed by executing the following commands:

module load biosoftw
module load treebest

TreeBeST will be added to the users PATH and can then be run with the command:

treebest <command> [options]

Below lists the commands and their summaries for the treebest program.

Command

nj           build neighbour-joining tree, SDI, rooting
best         build tree with the help of a species tree
phyml        build phyml tree
sdi          speciation vs. duplication inference
spec         print species tree
format       reformat a tree
filter       filter a multi-alignment
trans        translate coding nucleotide alignment
backtrans    translate aa alignment back to nt
leaf         get external nodes
mfa2aln      convert MFA to ALN format
ortho        ortholog/paralog inference
distmat      distance matrix
treedist     topological distance between two trees
pwalign      pairwise alignment
mmerge       merge a forest
export       export a tree to EPS format
subtree       extract the subtree
simulate     simulate a gene tree
sortleaf     sort leaf order
estlen       estimate branch length
trimpoor     trim out leaves that affect the quality of a tree
root         root a tree by minimizing height

Options

treebest nj [options] <input_file>
      -c FILE          constrained tree(s) in NH format [null]
      -m FILE          tree to be compared [null]
      -s FILE          species tree in NH format [default taxa tree]
      -l FILE          ingroup list file [null]
      -t TYPE          codon NT: ntmm, dn, ds, dm; AA: mm, jtt, kimura [mm]
                       ntmm          p-distance (codon alignment)
                       dn            non-synonymous distance
                       ds            synonymous distance
                       dm            dn-ds merge (tree merge)
                       mm            p-distance (amino acid alignment)
                       jtt           JTT model (maximum likelihood)
                       kimura        mm + Kimura's correction
      -T NUM           time limit in seconds [no limit]
      -b NUM           bootstrapping times [100]
      -F NUM           quality cut-off [15]
      -o STR           outgroup for tree cutting [Bilateria]
      -S               treat the first constrained tree as the original tree
      -C               use the leaves of constrained trees as ingroup
      -M               do not apply alignment mask
      -N               do not mask poorly aligned segments
      -g               collapse alternative splicing
      -R               do not apply leaf-reordering
      -p               the root node is a putative node
      -a               branch mode that is used by most tree-builder
      -A               the input alignment is stored in ALN format
      -W               wipe out root (SDI information will be lost!)
      -v               verbose output
      -h               help
treebest best [options] <CDS_alignment>        
            General Options:
            -P               skip PHYML
            -S               ignore the prob. of gene evolution (NOT recommended)
            -A               apply constraint to PHYML
            -C FILE          constraining tree [null]
            -f FILE          species tree [default]
            -r               discard species that do not appear at all
            Output Options:
            -D               output some debug information
            -q               suppress part of PHYML warnings
            -p STR           prefix of intermediate trees [null]
            -o FILE          output tree [null]
            Alignment Preprocessing Options:
            -s               only build tree for genes from sequenced species
            -g               collapse alternative splicing forms
            -N               do not mask low-scoring segments
            -F INT           quality cut-off [11]
            PHYML Related Options:
            -c INT           number of rate categories for PHYML-HKY [2]
            -k FLOAT|e       tv/ts ratio (kappa), 'e' for estimatinig [e]
            -a FLOAT|e       alpha parameter for Gamma distribution [1.0]
            -d FLOAT         duplication probability [0.15]
            -l FLOAT         probability of a loss following a speciation [0.10]
            -L FLOAT         probability of a loss following a duplication [0.20]
            -b FLOAT         prob. of the presence of an inconsistent branch [0.01]
treebest phyml <alignment> [<tree>]
            General Options:
            -t task          build | opt | loglk | dist [build]
            -n               the input is a nucleotide alignment
            -s               print out some statistics
            -N               do not mask low-scoring segments
            -g               collapse alternative splicing
            -b INT           number of bootstraps (slow) [0]
            -o FILE          write output to file [stdout]
            -F INT           quality cut-off [15]
            Model Related Options:
            -m model         nt: JC69 | K2P | F81 | HKY | F84 | TN93 | GTR [HKY]
                             aa: JTT | MtREV | Dayhoff | WAG [WAG]
            -c INT           number of relative substitution rate categories [1]
            -k FLOAT|e       transversion/transition ratio, 'e' for estimating [e]
            -a FLOAT|e       alpha parameter for Gamma distribution [1.0]
            -i FLOAT|e       proportion of invariable sites [0]
            Options for TreeFam Extensions:
            -S               use a species tree to guide tree building
            -f FILE          species tree [TreeFam species tree]
            -d FLOAT         duplication probability  [0.15]
            -l FLOAT         probability of a loss following a speciation [0.10]
            -L FLOAT         probability of a loss following a duplication [0.20]
            -C FILE          constraining tree [NULL]
            -p FLOAT         prob. of the presence of an inconsistent branch [0.01]
treebest sdi [-r|-H|-R|-m <tree0>|-l <spec_list>] <tree>
            Options:
            -r               reroot
            -c               use core species tree instead of the default one
            -H               reroot by minimizing tree height, instead of by minimizing the number of duplication events.
            -R               do not reorder the leaves.
            -s FILE          species tree [default taxa tree]
            -l FILE          cut a subtree that contains genes whose species exist in list [null]
            -m FILE          compare topology with FILE and re-order the leaves [null]
treebest spec
treebest format [-1] <tree>
treebest filter [options] <alignment> 
            Options:
            -n               nucleotide alignment
            -g               collapse alternative splicing
            -M               do not apply alignment mask
            -N               do not mask low-scoring segments
            -F NUM           quality cut-off [15]
treebest trans <nucl_alignment>
treebest backtrans [-t <thres>] <aa_aln> <nt_seq>
treebest leaf <nh_tree>
treebest mfa2aln [-n] <fasta_align>
treebest ortho <tree>
treebest distmat <dn|ds|dm|jtt|kimura|mm|dns> <alignment>
treebest treedist <tree1> <tree2>
treebest pwalign [options] <nt2nt|aa2aa|nt2aa|splice> <seq1> <seq2> 
            Options :
            -f               generate full alignment
            -a               do not apply matrix mean in local alignment
            -d               just calculate alignment boundaries
            -o NUM           gap open penalty
            -e NUM           gap extension penalty
            -n NUM           gap end penalty for nt2nt or aa2aa
            -s NUM           frame-shift penalty for aa2nt
            -g NUM           good splicing penalty
            -w NUM           band-width
            -b NUM           bad splicing penalty
            -m               output miscellaneous information
            -h               help
treebest mmerge [-r <forest>
            Options:
            -r               reroot
treebest export [options] <tree>
            Options:
            -x NUM           width [640]
            -y NUM           height [480]
            -m NUM           margin [20]
            -f NUM           font size [11]
            -b FNUM          box size [4.0]
            -w FNUM          font width [font_size/2]
            -s FILE          species tree
            -B               suppress bootstrap value
            -M               black/white mode
            -S               show species name
            -d               speciation/duplication inference
            -p               pseudo-length
treebest subtree <tree> <list>
treebest simulate [options] 
            Options:
            -d FNUM          duplication probability [0.05]
            -l FNUM          loss probability [0.01]
            -p FNUM          loss probability after duplication [0.25]
            -m FNUM          max height [0.25]
            -n               not show internal name
            -h               help
treebest sortleaf <tree1> [<tree2>]
treebest estlen <tree> <matrix> <tag>
treebest trimpoor <tree> [<threshold>=0>]
treebest root <tree> 

Examples

#PBS -N treebest_test
#PBS -l walltime=00:05:00
#PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR
module load biosoftw
module load treebest
cp /usr/local/biosoftw/treebest-1.9.2/examples/ex1.nucl.* .

treebest nj ex1.nucl.mfa > ex1.nucl.1.nhx
cp ex1.nucl.nhx ex1.nucl.1.forest
cat ex1.nucl.1.nhx >> ex1.nucl.1.forest
treebest nj -m ex1.nucl.nhx ex1.nucl.mfa > ex1.nucl.2.nhx
treebest nj -v ex1.nucl.mfa
treebest best ex1.nucl.mfa -o ex1.nucl.3.nhx
treebest best -c 1 -a 0.9 -d 0.14 -l 0.09 -L 0.19 -b 0.009 -o ex1.nucl.4.nhx ex1.nucl.mfa
treebest phyml -o ex1.nucl.1.nh ex1.nucl.mfa
treebest phyml -o ex1.nucl.2.nh ex1.nucl.mfa ex1.nucl.nhx
treebest phyml -s -C ex1.nucl.nhx -o ex1.nucl.4.nh ex1.nucl.mfa
treebest phyml -b 2 -o ex1.nucl.5.nh ex1.nucl.mfa
treebest sdi ex1.nucl.nhx > ex1.nucl.5.nhx
treebest sdi -r ex1.nucl.nhx > ex1.nucl.6.nhx
treebest sdi -r ex1.nucl.nhx > ex1.nucl.7.nhx
treebest spec > all_species.nh
treebest format ex1.nucl.nhx
treebest filter -n -M -N ex1.nucl.mfa > ex1.nucl.1.mfa
treebest trans ex1.nucl.mfa > ex1.aa.mfa
treebest backtrans ex1.aa.mfa ex1.nucl.mfa > ex1.nucl.2.mfa
treebest leaf ex1.nucl.nhx > ex1.nucl.1.leaf
head ex1.nucl.1.leaf | tail -7 > ex1.nucl.1.sublist
treebest mfa2aln -n ex1.nucl.mfa > ex1.nucl.1.aln
treebest ortho ex1.nucl.nhx > ex1.nucl.1.ortho
treebest distmat dn ex1.nucl.mfa > ex1.nucl.1.matrix.dn
treebest distmat ds ex1.nucl.mfa > ex1.nucl.1.matrix.ds
treebest distmat dm ex1.nucl.mfa > ex1.nucl.1.matrix.dm
treebest treedist ex1.nucl.nhx ex1.nucl.1.nhx > ex1.nucl.1.dist
treebest mmerge -r ex1.nucl.1.forest > ex1.nucl.8.nhx
treebest export ex1.nucl.nhx > ex1.nucl.1.eps
treebest subtree ex1.nucl.nhx ex1.nucl.1.sublist > ex1.nucl.9.nhx
treebest simulate > ex1.nucl.6.nh
treebest simulate -d 0.04 -l 0.02 -p 0.5 -m 0.1 > ex1.nucl.7.nh
treebest sortleaf ex1.nucl.nhx > ex1.nucl.sorted.nhx
treebest sortleaf ex1.nucl.nhx ex1.nucl.1.nhx > ex1.nucl.sorted.2.nhx
treebest estlen ex1.nucl.nhx ex1.nucl.1.matrix.ds ds_method > ex1.nucl.1.estlen.ds.nhx
treebest trimpoor ex1.nucl.nhx > ex1.nucl.10.nhx
treebest root ex1.nucl.nhx > ex1.nucl.11.nhx

Further Reading

Supercomputer: 
Service: 
Fields of Science: 

Turbomole

TURBOMOLE is an ab initio computational chemistry program that implements various quantum chemistry algorithms. It is developed at the group of Prof. Reinhart Ahlrichs at the University of Karlsruhe.

Availability and Compatability

Turbomole is available on both Oakley and Glenn Clusters. The versions currently available at OSC are

VERSION GLENN OAKLEY
5.9.0 X  
5.9.1 X  
5.10.0 X  
6.0.0 X  
6.2 X  
6.3.1 X X

To use Turbomole's parallel computing capabilities on Glenn, load the parallel module with the "turbomole-parallel" prefix.

Usage

Access

Turbomole is available for use by all OSC users.

Setup

Use module avail to view available modules for a given machine. To load the appropriate Turbomole module, type:module load software-name.
For example: To select Turbomole version 6.3.1 on Oakley, type:module loadturbomole/6.3.1

Usage

First time users of Turbomole must sign the Turbomole license agreement form. It is available in Academic Agreement Forms or by clicking one of the following links: turbomole.pdf

To execute load a turbomole module; for serial programs, e.g.:

module load turbomole
<turbomole command>

for parallel programs, e.g.:

module load turbomole-parallel
<turbomole command>

Documentation

Recent Turbomole manuals are online at http://www.turbomole-gmbh.com/turbomole-manuals.html. The Turbomole 5.8.0 User's Manual is available online athttps://www.osc.edu/archive/manuals/Turbomole/html/DOK.html.

Supercomputer: 
Service: 

TurboVNC

Introduction

TurboVNC is an implementation of VNC optimized for 3D graphics rendering.  Like other VNC software, TurboVNC can be used to create a virtual desktop on a remote machine, which can be useful for visualizing CPU-intensive graphics produced remotely.      

Availability

 

Version Glenn Oakley
1.0 X  
1.1 X* X

Usage

Please do not SSH directly to compute nodes and start VNC sessions! This will negatively impact other users (even if you have been assigned a node via the batch scheduler), and we will consider repeated occurances an abuse of the resources. If you need to use VNC on a compute node, please see our HOWTO for instructions.

To use the applications provided in the TurboVNC module, you must first load the module using the following command:

module load turbovnc

To start a VNC server on your current host, use the following command:

vncserver  

After starting the VNC server you should see output similar to the following:  

New 'X' desktop is hostname:display
Starting applications specified in /nfs/nn/yourusername/.vnc/xstartup.turbovnc
Log file is /nfs/nn/yourusername/.vnc/hotsname:display.log

Make a note of the hostname and display number ("hostname:display"), because you will need this information later in order to connect to the running VNC server.  

To establish a standard unencrypted connection to an already running VNC server, X11 forwarding must first be enabled in your SSH connection.  This can usually either be done by changing the preferences or settings in your SSH client software application, or by using the -X or -Y option on your ssh command.     

Once you are certain that X11 forwarding is enabled, create your VNC desktop using the vncviewer command in a new shell.

vncviewer

You will be prompted by a dialogue box asking for the VNC server you wish to connect to.  Enter "hostname:display".  

You may then be prompted for your HPC password.  Once the password has been entered your VNC desktop should appear, where you should see all of your home directory contents.   

When you are finished with your work on the VNC desktop, you should make sure to close the desktop and kill the VNC server that was originally started.  The VNC server can be killed using the following command in the shell where the VNC server was originally started:

vncserver -kill :[display]

For a full explanation of each of the previous commands, type man vncserver or man vncviewer at the command line to view the online manual.

Further Reading

Additional information about TurboVNC can be found at the VirtualGL Project's documentation page.  

Supercomputer: 
Service: 

Quantum Espresso

Quantum ESPRESSO is a program package for ab-initio molecular dynamics (MD) simulations and electronic structure calculations.  It is based on density-functional theory, plane waves, and pseudopotentials.

Availability & Restrictions.

Quantum ESPRESSO is open source and available without restriction.  We recommend that Oakley be used.

The following versions are available on OSC systems:

Version Glenn Oakley
5.0.3   X

Usage

Set-up

You can configure your environment for the usage of Quantum ESPRESSO by running the following command:

module load espresso

Batch Usage

Sample batch scripts and input files are available here:

/nfs/10/srb/workshops/compchem/espresso/

See the Quantum ESPRESSO documentation page for tutorial materials.

Further Reading

See Also

Supercomputer: 
Service: 

VASP

The Vienna Ab initio Simulation Package, VASP, is a program package for ab-initio quantum-mechanical molecular dynamics (MD) simulations and electronic structure calculations, from first principles.

Availability & Restrictions

Due to licensing considerations, OSC does not provide general access to this software. However, we are available to assist with the configuration of individual research-group installations on both the Oakley and the Glenn Cluster. The existing VASP module on Glenn was setup for a specific research group and is not available for general use. See the The VASP FAQ page for information regrading licensing.

The following versions of VASP are available with restrictions as listed above on OSC systems:

Version Glenn Oakley
4.6 X  

Usage

Set-up

For those with access you can configure your environment for the usage of VASP on Glenn by running the following command:

module load vasp

Using VASP

See the VASP documentation page for tutorial materials.

Further Reading

See Also

Supercomputer: 
Service: 

VTK

Introduction

The Visualization ToolKit (VTK) is an open source, freely available software system for 3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several interpreted interface layers including Tcl/Tk, Java and Python.

Version

 

Version Glenn Oakley
5.0.4 X  

Usage

You can use the following command to load the VTK module:

module load vtk

VTK example code can be found on the Glenn cluster in $VTK_DIR/Examples. Refer to the file README.txt in this directory for information about the content of the examples.

Further Reading

Links to additional documentation can be found online at http://www.vtk.org/doc/release/5.0/html/.

Supercomputer: 
Service: 
Fields of Science: