Hi all,
We have been trying out various aligners over the last many months, but I wanted to get some input from the community on suggested aligners. The genomes we have are as follows:
Genome:
Viral (RNA-based)
10kb genome
High divergence (5-10% normal, sometimes 15%)
Considerations:
We don't care about memory (small genome)
Aligner must be able to deal with divergence and gaps (few)
We don't care about (super) speed
Preferably scalable and 'pipeable' (samtools, picard, GATK, etc.).
Goals:
Consensus sequence for 'standard' phylo studies
High-coverage (>150x) for intra-host SNP detection
We mostly use illumina 50bp SE sequencing, but also have 100bp PE and ~500bp 454 sequencing. We have been testing multiple different aligners and have focused on the following:
BFAST
BWA
SSAHA2
NovoAlign
Mosaik (mostly v1, but v2 just came out)
Stampy
So far we have had most success with Mosaik and NovoAlign in terms of specificity and sensitivity on the Illumina platform. For 454 we have only used Mosaik for now (which works well - but homopolymer and CAFIE errors have to be cleaned up manually). For these two tools we generally use a hash-size of 6 and a divergence of 0.1. For Mosaik we also specify an 'act' threshold of 10. We have tweaked several other parameters, but have found them to influence the alignments very minimally.
I was wondering if anybody would have further insights or suggestions? We are currently scaling up production so any suggestions and comments would be most welcome.
We have been trying out various aligners over the last many months, but I wanted to get some input from the community on suggested aligners. The genomes we have are as follows:
Genome:
Viral (RNA-based)
10kb genome
High divergence (5-10% normal, sometimes 15%)
Considerations:
We don't care about memory (small genome)
Aligner must be able to deal with divergence and gaps (few)
We don't care about (super) speed
Preferably scalable and 'pipeable' (samtools, picard, GATK, etc.).
Goals:
Consensus sequence for 'standard' phylo studies
High-coverage (>150x) for intra-host SNP detection
We mostly use illumina 50bp SE sequencing, but also have 100bp PE and ~500bp 454 sequencing. We have been testing multiple different aligners and have focused on the following:
BFAST
BWA
SSAHA2
NovoAlign
Mosaik (mostly v1, but v2 just came out)
Stampy
So far we have had most success with Mosaik and NovoAlign in terms of specificity and sensitivity on the Illumina platform. For 454 we have only used Mosaik for now (which works well - but homopolymer and CAFIE errors have to be cleaned up manually). For these two tools we generally use a hash-size of 6 and a divergence of 0.1. For Mosaik we also specify an 'act' threshold of 10. We have tweaked several other parameters, but have found them to influence the alignments very minimally.
I was wondering if anybody would have further insights or suggestions? We are currently scaling up production so any suggestions and comments would be most welcome.
Comment