Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • which Aligner for SOLiD data?

    Hi everyone,

    we just recently got a SOLiD platform and are in the process of finding the best possible pipeline for our analysis. we are considering aligning on Bioscope 1.2, SHRiMP2 and BFAST and will probably run our analysis on the 3 of them and do a bit of comparison. is there another aligner you would recommend over those 3? and if you have been using Bioscope/SHRiMP2/BFAST which one would you recommend?

    which other "after alignment" softwares would you recommand? we will use SAMtools and Picard to play around with SAM/BAM files and then use IGV for visualizing the results...

    thanks in advance for your time and inputs.
    Fabrice.

    PS: we will start by doing some human samples whole exome using the nimblegen in-solution enrichment kit
    Last edited by Fabrice ODEFREY; 06-02-2010, 08:59 PM.

  • #2
    Originally posted by Fabrice ODEFREY View Post
    Hi everyone,

    we just recently got a SOLiD platform and are in the process of finding the best possible pipeline for our analysis. we are considering aligning on Bioscope 1.2, SHRiMP2 and BFAST and will probably run our analysis on the 3 of them and do a bit of comparison. is there another aligner you would recommend over those 3? and if you have been using Bioscope/SHRiMP2/BFAST which one would you recommend?

    which other "after alignment" softwares would you recommand? we will use SAMtools and Picard to play around with SAM/BAM files and then use IGV for visualizing the results...

    thanks in advance for your time and inputs.
    Fabrice.

    PS: we will start by doing some human samples whole exome using the nimblegen in-solution enrichment kit
    As the author of BFAST, I would also recommend throwing BWA into your comparisons. The aligners you mention above and BWA are all "gapped" (consider indels), which is really important for alignment accuracy and obviously indel identification.

    I would also look at comparing a few different variant callers, like the MAQ and SOAP models (both implemented in SAMtools), VarScan, DiBayes, and the GATK caller.

    I have found that since each read is aligned independently, you can get a reference allele bias on SOLiD at SNP positions due to sequencing error. You can look at local re-aligners like the one in GATK or my own (http://srma.sf.net), which can utilize the original color calls and qualities to remove this artifact, along with cleaning up ambiguities around observed indels.

    You can use, but are not limited to, dbsnp concordance (SNPs and indels), comparison with a SNP microarray, or simulated data to test your variant discovery. My opinion is you will get good results no matter what above tools you choose. Other than that, I look forward to your assessment.

    Comment


    • #3
      Thanks Nils for your answer. I had the (wrong?) impression that BWA couldn't deal with csfasta format of the SOLiD and that you had to transform into fastaq, hence loosing the color space specificity of the SOLiD format. thanks also for the variant callers and local re-aligners I will look at them.

      Comment


      • #4
        Originally posted by Fabrice ODEFREY View Post
        Thanks Nils for your answer. I had the (wrong?) impression that BWA couldn't deal with csfasta format of the SOLiD and that you had to transform into fastaq, hence loosing the color space specificity of the SOLiD format. thanks also for the variant callers and local re-aligners I will look at them.
        BWA converts the CSFASTA and QUAL file to the FASTQ format (so does BFAST and other aligners). BWA trims the first adapter and color though, and the output loses two bases for a read (a 50bp read is now 48bp in the SAM output etc.). The CS/CQ tags are also not present, so they cannot be leveraged in downstream analysis (i.e. local re-alignment). The called bases originate from the original color calls, so sequencing errors are detected/corrected, and SNPs (and small indels) powerfully detected. Still, BWA gives good variant calls as it tends not to mismap very often (a major source of false variation). I would still use it in your comparisons as it is an open question whether the above deficits matter. You could also convince Heng Li (the BWA author) to add better SOLiD support

        Comment


        • #5
          Originally posted by nilshomer View Post
          You can use, but are not limited to, dbsnp concordance (SNPs and indels), comparison with a SNP microarray, or simulated data to test your variant discovery. My opinion is you will get good results no matter what above tools you choose. Other than that, I look forward to your assessment.
          For your simulated data take a look to dnaa.sourceforge.net (guess who is the author?). I find it very useful. For example in a recent project I am working on: http://github.com/drio/synthetic.pipe for your simulated data.

          I am also very interested on seeing your performance with Bioscope and how they have changed it to make it more user friendly.

          Please, share your results once you are done.
          -drd

          Comment


          • #6
            Originally posted by drio View Post
            For your simulated data take a look to dnaa.sourceforge.net (guess who is the author?).
            To be fair, the original fast code was written by Heng Li (found in SAMtools), and I just modified it for my own purposes.

            Comment


            • #7
              Originally posted by nilshomer View Post
              I have found that since each read is aligned independently, you can get a reference allele bias on SOLiD at SNP positions due to sequencing error. You can look at local re-aligners like the one in GATK or my own (http://srma.sf.net), which can utilize the original color calls and qualities to remove this artifact, along with cleaning up ambiguities around observed indels.
              Can you show an example of this a before and after re-alignment (samtools tview) ?
              -drd

              Comment


              • #8
                Originally posted by drio View Post
                Can you show an example of this a before and after re-alignment (samtools tview) ?
                I guess we are hijacking this thread (a bit).

                Check out the attached PDF from IGV (tview crashes). There is a 15bp deletion and a SNP eight bases 8 bases right of the deletion. This is from our U87 genome sequencing (cancer) and was validated with Sanger sequencing (I'd be happy to send the traces). There are two tracks, one with BFAST (above) and one with SRMA applied after BFAST.

                Here are my observations. One is that it is amazing that any of the 50bp reads are aligned correctly (with a SNP and 15bp deletion). Since the reads are randomly sampled from the underlying chromosome (haploid region), some of the reads will have the deletion or SNP towards the end of the read. The local alignment for each read is optimal, but incorrect given the total information from all reads (subtle point). You can see there are many reads that have spurious indels as well as SNPs. After local re-alignment, all but one of the reads now agree on the indel and SNP, with no spurious SNPs or indels.

                I can find examples for heterozygous SNPs/indels where the allele frequency between the alleles is moved towards 50/50 (normal diploid regions). All this is in my manuscript justifying this type of tool. This type of cleanup can make life a lot easier for SNP/indel callers.
                Attached Files

                Comment


                • #9
                  thanks for your feedbacks it is all very interesting. I will definitely share our experience with bioscope as well as other aligners, once we have had a few analysis done with it.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X