BLAST

The BLAST programs are widely used tools for searching DNA and protein databases for sequence similarity to identify homologs to a query sequence. While often referred to as just "BLAST", this can really be thought of as a set of programs: blastp, blastn, blastx, tblastn, and tblastx.

Availability & Restrictions

BLAST is available without restriction to all OSC users.

The following versions of BLAST are available on OSC systems:

Version Glenn Oakley
2.2.17 X  
2.2.23+ X  
2.2.24+ X X
2.2.25+ X X
2.2.26   X

 

If you need to use blastx, you will need to load one of the C++ implimenations modules of blast (any version with a "+").

Usage

Set-up

Setting up BLAST for usage depends on the system you are using. On Glenn, load the biosoftware module followed by the BLAST specific module:

module load biosoftw
module load blast

Then create a resource file .ncbirc, and put it under your home directory.
If you are using the legacy blast program, the contents of the file contains at least two variables DATA and BLASTDB:

[NCBI]
DATA="/usr/local/biosoftw/blast-2.2.17/data/"
[BLAST]
BLASTDB="/nfs/proj01/PZS0002/biosoftw/db/"

If you are using the C++ implementation of blast program, the contents of the file contains at least one variable BLASTDB:

[BLAST]
BLASTDB="/nfs/proj01/PZS0002/biosoftw/db/"

On Oakley, just load the BLAST specific module:

module load blast

The resource file .ncbirc under home directory should contain the following two lines:

[BLAST]
BLASTDB="/nfs/proj01/PZS0002/biosoftw/db/"

Upon start, BLAST  will read this file to get the path information it needs during BLAST searches. Without this file, BLAST will search the working directory, or whichever directory the command is issued from.

Using BLAST

The five flavors of BLAST mentioned above perform the following tasks:

blastp: compares an amino acid query sequence against a protein sequence database
blastn: compares a nucleotide query sequence against a nucleotide sequence database
blastx: compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database
tblastn: compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).
tblastx: compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. (Due to the nature of tblastx, gapped alignments are not available with this option)

We provide local access to nr and swissprot databases. Other databases are available upon request.

Batch Usage

A sample batch script is below:

#PBS -l nodes=1:ppn=1
#PBS -l walltime=10:00
#PBS -N Blast
#PBS -S /bin/bash
#PBS -j oe

module load blast
set -x

cd $PBS_O_WORKDIR
mkdir $PBS_JOBID

cp 100.fasta $TMPDIR
cd $TMPDIR
/usr/bin/time blastn -db nt -query 100.fasta  -out test.out

cp * $PBS_O_WORKDIR/$PBS_JOBID

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: