"RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program." (

Availability & Restrictions

RepeatMasker is available to all OSC users without restriction.

The following versions of RepeatMasker are available on OSC systems:

Version Glenn Oakley
2.1 X  



On the Glenn Cluster RepeatMasker is accessed by executing the following commands:

module load biosoftw
module load RepeatMasker

RepeatMasker will be added to the users PATH and can be run with the command:

RepeatMasker [-options] <seqfiles(s) in fasta format>


      Detailed help
      Default settings are for masking all type of repeats in a primate sequence.
-pa(rallel) [number]
      The number of processors to use in parallel (only works for batch files or sequences over 50 kb)
-s    Slow search; 0-5% more sensitive, 2-3 times slower than default
-q    Quick search; 5-10% less sensitive, 2-5 times faster than default
-qq   Rush job; about 10% less sensitive, 4->10 times faster than default (quick searches are fine under most circumstances) repeat options
-nolow /-low
      Does not mask low_complexity DNA or simple repeats
-noint /-int
      Only masks low complex/simple repeats (no interspersed repeats)
      Does not mask small RNA (pseudo) genes
      Only masks Alus (and 7SLRNA, SVA and LTR5)(only for primate DNA)
-div [number]
      Masks only those repeats < x percent diverged from consensus seq
-lib [filename]
      Allows use of a custom library (e.g. from another species)
-cutoff [number]
      Sets cutoff score for masking repeats when using -lib (default 225)
-species <query species>
      Specify the species or clade of the input sequence. The species name must be a valid NCBI Taxonomy Database species name and be contained in the RepeatMasker repeat database. Some examples are:
      -species human
      -species mouse
      -species rattus
      -species "ciona savignyi"
      -species arabidopsis
      Other commonly used species: mammal, carnivore, rodentia, rat, cow, pig, cat, dog, chicken, fugu, danio, "ciona intestinalis" drosophila, anopheles, elegans, diatoaea, artiodactyl, arabidopsis, rice, wheat, and maize

Contamination options

      Only clips E coli insertion elements out of fasta and .qual files
      Clips IS elements before analysis (default: IS only reported)
      Skips bacterial insertion element check
      Only checks for rodent specific repeats (no repeatmasker run)
      Only checks for primate specific repeats (no repeatmasker run)

Running options

-gc [number]
      Use matrices calculated for 'number' percentage background GC level
      RepeatMasker calculates the GC content even for batch files/small seqs
-frag [number]
      Maximum sequence length masked without fragmenting (default 40000, 300000 for DeCypher)
-maxsize [nr]
      Maximum length for which IS- or repeat clipped sequences can be produced (default 4000000). Memory requirements go up with higher maxsize.
      Skips the steps in which repeats are excised
      Prints search engine progress report to screen (defaults to .stderr file)
      Do not postprocess the results of the run ( i.e. call ProcessRepeats).
       NOTE: This options should only be used when ProcessRepeats will be run manually on the results.

Output options

-dir [directory name]
      Writes output to this directory (default is query file directory, "-dir ." will write to current directory).
      Writes alignments in .align output file; (not working with -wublast)
      Alignments are presented in the orientation of the repeat (with option -a)
      Outputs ambiguous DNA transposon fragments using a lower case name.  All other repeats are listed in upper case. Ambiguous fragments match multiple repeat elements and can only be called based on flanking repeat information.
      Returns complete .masked sequence in lower case
      Returns repetitive regions in lowercase (rest capitals) rather than masked
-x    Returns repetitive regions masked with Xs rather than Ns
      Reports simple repeats that may be polymorphic (in file.poly)
      Includes for each annotation the HSP "evidence". Currently this option is only available with the "-html" output format listed below.
      Creates an additional output file in xhtml format.
      Creates an additional output file in ACeDB format
      Creates an additional Gene Feature Finding format output
-u    Creates an additional annotation file not processed by ProcessRepeats
-xm   Creates an additional output file in cross_match format (for parsing)
      Creates an (old style) annotation file with fixed width columns
      Leaves out final column with unique ID for each element (was default)
      Calculates repeat densities (in .tbl) excluding runs of >=20 N/Xs in the query


#PBS -N RepeatMasker_test
#PBS -l walltime=4:00:00
#PBS -l nodes=1:ppn=4

module load biosoftw
module load RepeatMasker
cp /usr/local/biosoftw/bowtie-0.12.7/genomes/NC_008253.fna .
RepeatMasker –pa 4 NC_008253.fna


The following commands result in errors:  RepeatMasker -w, RepeatMasker -de, RepeatMasker -e.

Further Reading

Fields of Science: