PAML

"PAML (for Phylogentic Analysis by Maximum Likelihood) contains a few programs for model fitting and phylogenetic tree reconstruction using nucleotide or amino-acid sequence data." (doc/pamlDOC.pdf)

Availability & Restrictions

PAML is available to all OSC users without restriction.

The following versions of PAML are available on OSC systems:

Version Glenn Oakley
4.4d X  

Usage

Set-up

On the Glenn Cluster paml is accessed by executing the following commands:

module load biosoftw
module load paml

Using PAML

PAML is a collection of several programs that will be added to the users PATH: baseml, basemlg, chi2, codeml, ds, evolver, mcmctree, pamp, and yn00.  Each of the programs has separate, but typically similar usage and options.

Options

  • baseml / basemlg   Maximum likelihood analysis of nucleotide sequences using a faster discrete model / Implements the (continuous) gamma model of Yang (Intensive Computation)

Both baseml and basemlg require a baseml.ctl in the current directory with the following variables set: seqfile, outfile, treefile

The following are optional variable to set in baseml.ctl: noisy, verbose, runmode, model, Mgene, ndata, clock, fix_kappa, kappa, fix_alpha, alpha, Malpha, ncatG, fix_rho,nparK, nhomo, getSE, RateAncestor, Small_Diff, cleandata, icode, fix_blength, method

chi2   Calculates the x2 critical value and p value for conducting the likelihood ratio test

chi2 [p | INTEGER DOUBLE]

chi2                   prints x2 critical values at set significance levels until ‘q+ENTER’ is reached

chi2 p                 interactive set the degrees of freedom and x2 value

chi2 INTEGER DOUBLE    Computes the probability for INTEGER df and DOUBLE x2

 

  • codeml   Implements the codon substitution model of Goldman & Yang for DNA and amino acid sequences

codeml requires codeml.ctl to be located in the current directory with the following variables set: seqfile, outfile, treefile, aaRatefile

The following are optional variables to set in codeml.ctl: noisy, verbose, runmode, seqtype, CodonFreq, ndata, aaDist, model, NSsites, icode, Mgene, fix_kappa, kappa, fix_omega, omega, fix_alpha, alpha, Malpha, ncatG, getSE, RateAncestor, Small_Diff, cleandata, fix_blength, method

  • ds   Computes descriptive statistics from a baseml/basemlg analysis

ds filename.type

  • evolver   Simulates sequences under nucleotide, codon, and amino acid substitution models; generates random trees; and calculates the partition distances between trees

EVOLVER in paml version 4.4d, March 2011

Results for options 1-4 & 8 go into evolver.out

Options

      (1) Get random UNROOTED trees?

      (2) Get random ROOTED trees?

      (3) List all UNROOTED trees?

      (4) List all ROOTED trees?

      (5) Simulate nucleotide data sets (use MCbase.dat)?

      (6) Simulate codon data sets      (use MCcodon.dat)?

      (7) Simulate amino acid data sets (use MCaa.dat)?

      (8) Calculate identical bi-partitions between trees?

      (9) Calculate clade support values (read 2 treefiles)?

      (11) Label clades?

      (0) Quit?


evolver’s option 5 requires MCbase.dat.  evolver’s option 6 requires MCcodon.dat.  evolver’s option 7 requires MCaa.dat and dat/mtmam.dat.  evolver’s option 9 requires truetree rst1 (formed from stewart.trees & codeml's output rst1).  evolver’s option 11 requires name.tress with user input.

  • mcmctree   Implements the Bayesian MCMC algorithm of Yang and Rannala for estimating species divergence times

mcmctree requires mcmctree.ctl to be located in the current directory with the following variables set: seqfile, treefile, outfile, RootAge, usedata

The following are optional variables to set in mcmctree.ctl: seed, ndata, clock, model, alpha, ncatG, cleandata, BDparas, kappa_gamma, alpha_gamma, rgene_gamma, sigma2_gamma, finetune, print, burnin, sampfreq, nsample

  • pamp   Implements the parsimony-based analysis of Yang and Kumar

pamp requires pamp.ctl to be located in the current directory with the following variables set: seqfile, treefile, outfile

The following are optional variables to set in pamp.ctl: seqtype, ncatG, nhomo

  • yn00   Implements the method of Yang and Nielson for estimating synonymous and nonsynonymous substitution rates in pairwise comparisons of protein-coding DNA sequences

yn00 requires yn00.ctl to be located in the current directory with the following variables set: seqfile, outfile

The following are optional variables to set in yn00.ctl: verbose, icode, weighting, commonf3x4, ndata

Control Files

All .ctl files (baseml.ctl, codeml.ctl, mcmctree.ctl, pamp.ctl, and yn00.ctl) have comment line starting with '*'.

Batch Usage

#PBS -N paml_test
#PBS -l walltime=0:05:00
#PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR
module load biosoftw
module load paml
export PAML_DIR=/usr/local/biosoftw/paml44
cp $PAML_DIR/*.* .
cp -r $PAML_DIR/dat .
cp -r $PAML_DIR/examples .
baseml
chi2 1 3.84
codeml
ds in.baseml
echo -e "1\n5\n5 5\n0\n2\n5\n5 5\n0\n3\n5\n4\n5\n5\n6\n7\n8\n" | evolver"
mcmctree
pamp
yn00

Further Reading

  • Four pdf documents are located in the following folder on Glenn:  /usr/local/biosoftw/paml44/doc/
  • An online discussion group for users is paml is located at the following website: http://www.rannala.org/phpBB2/
Supercomputer: 
Service: