The Sequence Read Archive (SRA Toolkit) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.
Availability and Restrictions
The following versions of SRA Toolkit are available on OSC clusters:
Version | Owens | Pitzer | Cardinal | Note |
---|---|---|---|---|
2.6.3 | X | These versions no longer support downloading SRA data** but still can be used to process local data. | ||
2.9.0 | X | |||
2.9.1 | X | |||
2.9.6 | X* | X* | ||
2.10.7 | X | X | ||
2.11.2 | X | X | ||
3.0.2 | X | X | X* |
You can use module spider sratoolkit
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
Access
SRA Toolkit is available to all OSC users. If you have any questions, please contact OSC Help.
Publisher/Vendor/Repository and License Type
National Center for Biotechnology Information, Freeware
Usage
Usage on Pitzer and Owens
Set-up
module load sratoolkit
. The default version will be loaded. To select a particular SRA Toolkit version, use module load sratoolkit/version
. For example, use module load sratoolkit/2.11.2
to load SRA Toolkit 2.11.2Download SRA Data
NCBI now uses cloud-style object stores. To access SRA cloud data, use version 2.10 or later and provide your AWS or GCP access credentials (recommended) to vdb-config
. For more information, see https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials.
Set up the credentials (recommended)
Once you have obtained an AWS or GCP credential file, you can set the credentials by following these steps:
module load sratoolkit/2.11.2 vdb-config --report-cloud-identity yes # For GCP credentials vdb-config --set-gcp-credentials /path/to/gcp/creddential/file # For AWS credentials vdb-config --set-aws-credentials /path/to/aws/creddential/file
vdb-config -i
to access the interactive configuration. For additional information, please visit the following link: https://github.com/ncbi/sra-tools/wiki/03.-Quick-Toolkit-Configuration.You can now download SRA data using prefetch
prefetch SRR390728
The default download path is located in your home directory at ~/ncbi. For instance, if you're looking for the SRA file SRR390728.sra, you can find it at ~/ncbi/sra, and the resource files can be found at ~/ncbi/refseq. You can use srapath
to verify if the SRA accession is accessible in the download path
$ srapath SRR390728
/users/PAS1234/johndoe/
ncbi/sra/sra/SRR390728.sra
You can now run other SRA tools, such as fastq-dump
, on computing nodes. Here is an example job script:
#!/bin/bash #SBATCH --job-name use_fastq_dump #SBATCH --time=0:10:0 #SBATCH --ntasks-per-node=1 module load sratoolkit/2.11.2 module list fastq-dump -X 5 -Z SRR390728
Unfortunately, Home Directory file system is not optimized for handling heavy computations. If the SRA file is particularly large, you can change the default download path for SRA data to our scratch file system using one of the following two approaches. The following approaches use the /fs/scratch/PAS1234/johndoe/ncbi directory as an example.
Change the prefetch directory using vdb-config
module load sratoolkit/2.11.2 vdb-config -s /repository/user/main/public/root=/fs/scratch/PAS1234/johndoe/ncbi prefetch SRR390728 srapath SRR390728
You should find the SRR390728 accession at /fs/scratch/PAS1234/johndoe/ncbi/sra/SRR390728.sra
Download to the current directory (available for version 2.10 or later)
module load sratoolkit/2.11.2
vdb-config --prefetch-to-cwd
mkdir -p /fs/scratch/PAS1234/johndoe/ncbi
cd /fs/scratch/PAS1234/johndoe/ncbi
prefetch SRR390728
srapath SRR390728
You should find the SRR390728 accession at /fs/scratch/PAS1234/johndoe/ncbi/SRR390728/SRR390728.sra
Known Issues
Error when downloading SRA data