Hi, I'm trying to get a sense of what the current consensus is regarding the best practice for mapping SOLiD colour-space paired-end reads. As far as I can tell the options appear to be rather limited:
1. LifeScope.
Advantages: designed specifically for paired-end colour-space reads and maps the most (which may also be a disadvantage… ). Disadvantages: Very slow and very resource hungry. Overly complicated command line interface. Can only run on a limited range of hardware/software. Difficult to leverage in to GATK. And to rub salt into the wound, AB are planning to charge for it in the near future.
2. Bowtie.
Advantages: Fast. Resource efficient. Multithreaded. Easy to use command line interface. Disadvantages: Tricky to leverage into GATK. Most importantly, Bowtie cannot accommodate indels. And Bowtie2 will not accept colour-space
3. BWA
Advantages: Fast. Efficient. Multithreaded. Accommodates indels . Easy to use interface. Easily leveraged into GATK pipeline (to a point - see below). BUT! NO current support for paired-end SOLiD data (Mate-pair yes, but not, it would seem, paired-end) - current workaround would appear to be to reverse F5 reads (and associated QVs), preferably trimming heavily at the 3' end to minimise potential problem of reversed error profile issues (though how much of an issue is this?). Creates all sorts of issues when leveraging into GATK (particularly as BWA does not include colour-space data in BAM, a prerequisite for GATK recalibration).
4. BFAST
Am still exploring this option -however would appear to accept paired-end colour-space data natively and can be leveraged into GATK pipelines but interface is a bit challenging, documentation a bit opaque, and parts of the process can be very slow.
Is there anything I've missed out? What are people's preferred strategy, particularly if they want to leverage into GATK for recalibration purposes? Are we all going to be using ECC generated base-space data so development in colorspace-compatible tools will dry up? Any thoughts on this subject would be most welcome!
1. LifeScope.
Advantages: designed specifically for paired-end colour-space reads and maps the most (which may also be a disadvantage… ). Disadvantages: Very slow and very resource hungry. Overly complicated command line interface. Can only run on a limited range of hardware/software. Difficult to leverage in to GATK. And to rub salt into the wound, AB are planning to charge for it in the near future.
2. Bowtie.
Advantages: Fast. Resource efficient. Multithreaded. Easy to use command line interface. Disadvantages: Tricky to leverage into GATK. Most importantly, Bowtie cannot accommodate indels. And Bowtie2 will not accept colour-space
3. BWA
Advantages: Fast. Efficient. Multithreaded. Accommodates indels . Easy to use interface. Easily leveraged into GATK pipeline (to a point - see below). BUT! NO current support for paired-end SOLiD data (Mate-pair yes, but not, it would seem, paired-end) - current workaround would appear to be to reverse F5 reads (and associated QVs), preferably trimming heavily at the 3' end to minimise potential problem of reversed error profile issues (though how much of an issue is this?). Creates all sorts of issues when leveraging into GATK (particularly as BWA does not include colour-space data in BAM, a prerequisite for GATK recalibration).
4. BFAST
Am still exploring this option -however would appear to accept paired-end colour-space data natively and can be leveraged into GATK pipelines but interface is a bit challenging, documentation a bit opaque, and parts of the process can be very slow.
Is there anything I've missed out? What are people's preferred strategy, particularly if they want to leverage into GATK for recalibration purposes? Are we all going to be using ECC generated base-space data so development in colorspace-compatible tools will dry up? Any thoughts on this subject would be most welcome!
Comment