BLAT

BLAT is a sequence analysis tool which performs rapid mRNA/DNA and cross-species protein alignments. BLAT is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.

BLAT is not BLAST. DNA BLAT works by keeping an index of the entire genome (but not the genome itself) in memory. Since the index takes up a bit less than a gigabyte of RAM, BLAT can deliver high performance on a reasonably priced Linux box. The index is used to find areas of probable homology, which are then loaded into memory for a detailed alignment. Protein BLAT works in a similar manner, except with 4-mers rather than 11-mers. The protein index takes a little more than 2 gigabytes.

Availability & Restrictions

BLAT is available without restriction to all OSC users.

The following versions of BLAT are available at OSC:

Version Glenn Oakley
34 X  

Usage

Set-up

To initalize the Glenn system prior to using BLAT, run the following commands:

module load biosoftw
module loat blat

Using BLAT

The main programs in the blat suite are:

gfServer – a server that maintains an index of the genome in memory and uses the index to quickly find regions with high levels of sequence similarity to a query sequence.
gfClient – a program that queries gfServer over the network, and then does a detailed alignment of the query sequence with regions found by gfServer.
blat –combines client and server into a single program, first building the index, then using the index, and then exiting. 
webBlat – a web based version of gfClient that presents the alignments in an interactive fashion. (not included on OSC server)

Building an index of the genome typically takes 10 or 15 minutes.  Typically for interactive applications one uses gfServer to build a whole genome index.  At that point gfClient or webBlat can align a single query within few seconds.  If one is aligning a lot of sequences in a batch mode then blat can be more efficient, particularly if run on a cluster of computers.  Each blat run is typically done against a single chromosome, but with a large number of query sequences.

Other programs in the blat suite are:

pslSort – combines and sorts the output of multiple blat runs.  (The blat default output format is .psl).
pslReps – selects the best alignments for a particular query sequence, using a ‘near best in genome’ approach.
pslPretty – converts alignments from the psl format, which is tab-delimited format and does not include the bases themselves, to a more readable alignment format.
faToTwoBit – convert Fasta format sequence files to a dense randomly-accessable  .2bit format that gfClient can use.
twoBitToFa – convert from the .2bit format back to fasta
faToNib – convert from Fasta to a somewhat less dense randomly accessible format that predates .2bit.  Note each .nib file can only contain a single sequence.
nibFrag – convert portions of a nib file back to fasta.

The command line options of each of the programs is described below. Similar summaries of usage are printed when a command is run with no arguments.

Batch Usage

A sample batch script is as below:

#PBS -N blat
#PBS -j oe
#PBS -l nodes=1:ppn=1
#PBS -S /bin/bash

cd $PBS_O_WORKDIR
blat -stepSize=5 -repMatch=2253 -minScore=0 -minIdentity=0 database.2bit query.fa output.psl 

 

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: