Introducing ASAP: Advanced Sequencing Automated Pipeline
ASAP is an open source pipeline designed to assist users in managing various jobs associated with processing HiSeq data on a cluster or in serial.
Capabilities
The software is highly modular, so users can selectively use or skip any of the steps including:
In addition to preparation, submission and monitoring of the processing scripts, ASAP also provides tools for slicing up fastq data for more parallel alignment, some basic QC including Ti/Tv ratios, heterozygous/homozygous alternate ratios, etc. to quickly identify issues before beginning analysis.
Data Integrity
To ensure only hi-quality data is used, a single job will fail if one of the child jobs fails or if quota issues prevent a successful copy. This means that no partially completed data slips through.
Software Download
Please feel free to have a look, and ask me any questions. The software can be freely downloaded at: http://biostat.mc.vanderbilt.edu/wiki/Main/ASAP There is a small tutorial with step by step instructions on how to use ASAP to install any missing components as well as run some data through the process.
Caveats
Currently, ASAP is designed for paired-end sequence data from Illumina’s HiSEQ platform, but we are interested in working with other groups to make it work with other popular platforms as well. It should run on most (hopefully all) PBS and SGE clusters as well as in serial mode, however, it is possible to create support for other cluster systems.
Extensibility
Adding some new functionality, such as support for an alternate aligner or SNP caller should be relatively easy, but introducing entirely new functionality might require the knowledge of Ruby programming.
ASAP is an open source pipeline designed to assist users in managing various jobs associated with processing HiSeq data on a cluster or in serial.
Capabilities
The software is highly modular, so users can selectively use or skip any of the steps including:
- Alignment with BWA
- Local Realignment around known Variants with GATK
- Quality score recalibration with GATK
- SNP calls with Samtools, GATK, GlfMultiples or a consensus of 2 or more
- INDEL calls with Samtools, GATK or a consensus
- Annotation using ANNOVAR
- Users can import bam files into a new or pre-existing pipeline and integrate those bams with bams created by ASAP for further processing (optional) and variant calls.
In addition to preparation, submission and monitoring of the processing scripts, ASAP also provides tools for slicing up fastq data for more parallel alignment, some basic QC including Ti/Tv ratios, heterozygous/homozygous alternate ratios, etc. to quickly identify issues before beginning analysis.
Data Integrity
To ensure only hi-quality data is used, a single job will fail if one of the child jobs fails or if quota issues prevent a successful copy. This means that no partially completed data slips through.
Software Download
Please feel free to have a look, and ask me any questions. The software can be freely downloaded at: http://biostat.mc.vanderbilt.edu/wiki/Main/ASAP There is a small tutorial with step by step instructions on how to use ASAP to install any missing components as well as run some data through the process.
Caveats
Currently, ASAP is designed for paired-end sequence data from Illumina’s HiSEQ platform, but we are interested in working with other groups to make it work with other popular platforms as well. It should run on most (hopefully all) PBS and SGE clusters as well as in serial mode, however, it is possible to create support for other cluster systems.
Extensibility
Adding some new functionality, such as support for an alternate aligner or SNP caller should be relatively easy, but introducing entirely new functionality might require the knowledge of Ruby programming.
Comment