Unconfigured Ad

**nilshomer** · 06-02-2010, 09:49 PM

Originally posted by Fabrice ODEFREY View Post

Hi everyone,

we just recently got a SOLiD platform and are in the process of finding the best possible pipeline for our analysis. we are considering aligning on Bioscope 1.2, SHRiMP2 and BFAST and will probably run our analysis on the 3 of them and do a bit of comparison. is there another aligner you would recommend over those 3? and if you have been using Bioscope/SHRiMP2/BFAST which one would you recommend?

which other "after alignment" softwares would you recommand? we will use SAMtools and Picard to play around with SAM/BAM files and then use IGV for visualizing the results...

thanks in advance for your time and inputs.
Fabrice.

PS: we will start by doing some human samples whole exome using the nimblegen in-solution enrichment kit

As the author of BFAST, I would also recommend throwing BWA into your comparisons. The aligners you mention above and BWA are all "gapped" (consider indels), which is really important for alignment accuracy and obviously indel identification.

I would also look at comparing a few different variant callers, like the MAQ and SOAP models (both implemented in SAMtools), VarScan, DiBayes, and the GATK caller.

I have found that since each read is aligned independently, you can get a reference allele bias on SOLiD at SNP positions due to sequencing error. You can look at local re-aligners like the one in GATK or my own (http://srma.sf.net), which can utilize the original color calls and qualities to remove this artifact, along with cleaning up ambiguities around observed indels.

You can use, but are not limited to, dbsnp concordance (SNPs and indels), comparison with a SNP microarray, or simulated data to test your variant discovery. My opinion is you will get good results no matter what above tools you choose. Other than that, I look forward to your assessment.

**Fabrice ODEFREY** · 06-03-2010, 02:52 AM

Thanks Nils for your answer. I had the (wrong?) impression that BWA couldn't deal with csfasta format of the SOLiD and that you had to transform into fastaq, hence loosing the color space specificity of the SOLiD format. thanks also for the variant callers and local re-aligners I will look at them.

**nilshomer** · 06-03-2010, 08:48 AM

Originally posted by Fabrice ODEFREY View Post

Thanks Nils for your answer. I had the (wrong?) impression that BWA couldn't deal with csfasta format of the SOLiD and that you had to transform into fastaq, hence loosing the color space specificity of the SOLiD format. thanks also for the variant callers and local re-aligners I will look at them.

BWA converts the CSFASTA and QUAL file to the FASTQ format (so does BFAST and other aligners). BWA trims the first adapter and color though, and the output loses two bases for a read (a 50bp read is now 48bp in the SAM output etc.). The CS/CQ tags are also not present, so they cannot be leveraged in downstream analysis (i.e. local re-alignment). The called bases originate from the original color calls, so sequencing errors are detected/corrected, and SNPs (and small indels) powerfully detected. Still, BWA gives good variant calls as it tends not to mismap very often (a major source of false variation). I would still use it in your comparisons as it is an open question whether the above deficits matter. You could also convince Heng Li (the BWA author) to add better SOLiD support

**drio** · 06-03-2010, 11:59 AM

Originally posted by nilshomer View Post

You can use, but are not limited to, dbsnp concordance (SNPs and indels), comparison with a SNP microarray, or simulated data to test your variant discovery. My opinion is you will get good results no matter what above tools you choose. Other than that, I look forward to your assessment.

For your simulated data take a look to dnaa.sourceforge.net (guess who is the author?). I find it very useful. For example in a recent project I am working on: http://github.com/drio/synthetic.pipe for your simulated data.

I am also very interested on seeing your performance with Bioscope and how they have changed it to make it more user friendly.

Please, share your results once you are done.

**nilshomer** · 06-03-2010, 12:02 PM

Originally posted by drio View Post

For your simulated data take a look to dnaa.sourceforge.net (guess who is the author?).

To be fair, the original fast code was written by Heng Li (found in SAMtools), and I just modified it for my own purposes.

**drio** · 06-03-2010, 12:02 PM

Originally posted by nilshomer View Post

I have found that since each read is aligned independently, you can get a reference allele bias on SOLiD at SNP positions due to sequencing error. You can look at local re-aligners like the one in GATK or my own (http://srma.sf.net), which can utilize the original color calls and qualities to remove this artifact, along with cleaning up ambiguities around observed indels.

Can you show an example of this a before and after re-alignment (samtools tview) ?

**nilshomer** · 06-03-2010, 01:55 PM

Originally posted by drio View Post

Can you show an example of this a before and after re-alignment (samtools tview) ?

I guess we are hijacking this thread (a bit).

Check out the attached PDF from IGV (tview crashes). There is a 15bp deletion and a SNP eight bases 8 bases right of the deletion. This is from our U87 genome sequencing (cancer) and was validated with Sanger sequencing (I'd be happy to send the traces). There are two tracks, one with BFAST (above) and one with SRMA applied after BFAST.

Here are my observations. One is that it is amazing that any of the 50bp reads are aligned correctly (with a SNP and 15bp deletion). Since the reads are randomly sampled from the underlying chromosome (haploid region), some of the reads will have the deletion or SNP towards the end of the read. The local alignment for each read is optimal, but incorrect given the total information from all reads (subtle point). You can see there are many reads that have spurious indels as well as SNPs. After local re-alignment, all but one of the reads now agree on the indel and SNP, with no spurious SNPs or indels.

I can find examples for heterozygous SNPs/indels where the allele frequency between the alleles is moved towards 50/50 (normal diploid regions). All this is in my manuscript justifying this type of tool. This type of cleanup can make life a lot easier for SNP/indel callers.

Attached Files

img.pdf (128.8 KB, 170 views)

**Fabrice ODEFREY** · 06-03-2010, 02:33 PM

thanks for your feedbacks it is all very interesting. I will definitely share our experience with bioscope as well as other aligners, once we have had a few analysis done with it.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

which Aligner for SOLiD data?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News