Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Fabrice ODEFREY
    Member
    • May 2010
    • 21

    which Aligner for SOLiD data?

    Hi everyone,

    we just recently got a SOLiD platform and are in the process of finding the best possible pipeline for our analysis. we are considering aligning on Bioscope 1.2, SHRiMP2 and BFAST and will probably run our analysis on the 3 of them and do a bit of comparison. is there another aligner you would recommend over those 3? and if you have been using Bioscope/SHRiMP2/BFAST which one would you recommend?

    which other "after alignment" softwares would you recommand? we will use SAMtools and Picard to play around with SAM/BAM files and then use IGV for visualizing the results...

    thanks in advance for your time and inputs.
    Fabrice.

    PS: we will start by doing some human samples whole exome using the nimblegen in-solution enrichment kit
    Last edited by Fabrice ODEFREY; 06-02-2010, 08:59 PM.
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    Originally posted by Fabrice ODEFREY View Post
    Hi everyone,

    we just recently got a SOLiD platform and are in the process of finding the best possible pipeline for our analysis. we are considering aligning on Bioscope 1.2, SHRiMP2 and BFAST and will probably run our analysis on the 3 of them and do a bit of comparison. is there another aligner you would recommend over those 3? and if you have been using Bioscope/SHRiMP2/BFAST which one would you recommend?

    which other "after alignment" softwares would you recommand? we will use SAMtools and Picard to play around with SAM/BAM files and then use IGV for visualizing the results...

    thanks in advance for your time and inputs.
    Fabrice.

    PS: we will start by doing some human samples whole exome using the nimblegen in-solution enrichment kit
    As the author of BFAST, I would also recommend throwing BWA into your comparisons. The aligners you mention above and BWA are all "gapped" (consider indels), which is really important for alignment accuracy and obviously indel identification.

    I would also look at comparing a few different variant callers, like the MAQ and SOAP models (both implemented in SAMtools), VarScan, DiBayes, and the GATK caller.

    I have found that since each read is aligned independently, you can get a reference allele bias on SOLiD at SNP positions due to sequencing error. You can look at local re-aligners like the one in GATK or my own (http://srma.sf.net), which can utilize the original color calls and qualities to remove this artifact, along with cleaning up ambiguities around observed indels.

    You can use, but are not limited to, dbsnp concordance (SNPs and indels), comparison with a SNP microarray, or simulated data to test your variant discovery. My opinion is you will get good results no matter what above tools you choose. Other than that, I look forward to your assessment.

    Comment

    • Fabrice ODEFREY
      Member
      • May 2010
      • 21

      #3
      Thanks Nils for your answer. I had the (wrong?) impression that BWA couldn't deal with csfasta format of the SOLiD and that you had to transform into fastaq, hence loosing the color space specificity of the SOLiD format. thanks also for the variant callers and local re-aligners I will look at them.

      Comment

      • nilshomer
        Nils Homer
        • Nov 2008
        • 1283

        #4
        Originally posted by Fabrice ODEFREY View Post
        Thanks Nils for your answer. I had the (wrong?) impression that BWA couldn't deal with csfasta format of the SOLiD and that you had to transform into fastaq, hence loosing the color space specificity of the SOLiD format. thanks also for the variant callers and local re-aligners I will look at them.
        BWA converts the CSFASTA and QUAL file to the FASTQ format (so does BFAST and other aligners). BWA trims the first adapter and color though, and the output loses two bases for a read (a 50bp read is now 48bp in the SAM output etc.). The CS/CQ tags are also not present, so they cannot be leveraged in downstream analysis (i.e. local re-alignment). The called bases originate from the original color calls, so sequencing errors are detected/corrected, and SNPs (and small indels) powerfully detected. Still, BWA gives good variant calls as it tends not to mismap very often (a major source of false variation). I would still use it in your comparisons as it is an open question whether the above deficits matter. You could also convince Heng Li (the BWA author) to add better SOLiD support

        Comment

        • drio
          Senior Member
          • Oct 2008
          • 323

          #5
          Originally posted by nilshomer View Post
          You can use, but are not limited to, dbsnp concordance (SNPs and indels), comparison with a SNP microarray, or simulated data to test your variant discovery. My opinion is you will get good results no matter what above tools you choose. Other than that, I look forward to your assessment.
          For your simulated data take a look to dnaa.sourceforge.net (guess who is the author?). I find it very useful. For example in a recent project I am working on: http://github.com/drio/synthetic.pipe for your simulated data.

          I am also very interested on seeing your performance with Bioscope and how they have changed it to make it more user friendly.

          Please, share your results once you are done.
          -drd

          Comment

          • nilshomer
            Nils Homer
            • Nov 2008
            • 1283

            #6
            Originally posted by drio View Post
            For your simulated data take a look to dnaa.sourceforge.net (guess who is the author?).
            To be fair, the original fast code was written by Heng Li (found in SAMtools), and I just modified it for my own purposes.

            Comment

            • drio
              Senior Member
              • Oct 2008
              • 323

              #7
              Originally posted by nilshomer View Post
              I have found that since each read is aligned independently, you can get a reference allele bias on SOLiD at SNP positions due to sequencing error. You can look at local re-aligners like the one in GATK or my own (http://srma.sf.net), which can utilize the original color calls and qualities to remove this artifact, along with cleaning up ambiguities around observed indels.
              Can you show an example of this a before and after re-alignment (samtools tview) ?
              -drd

              Comment

              • nilshomer
                Nils Homer
                • Nov 2008
                • 1283

                #8
                Originally posted by drio View Post
                Can you show an example of this a before and after re-alignment (samtools tview) ?
                I guess we are hijacking this thread (a bit).

                Check out the attached PDF from IGV (tview crashes). There is a 15bp deletion and a SNP eight bases 8 bases right of the deletion. This is from our U87 genome sequencing (cancer) and was validated with Sanger sequencing (I'd be happy to send the traces). There are two tracks, one with BFAST (above) and one with SRMA applied after BFAST.

                Here are my observations. One is that it is amazing that any of the 50bp reads are aligned correctly (with a SNP and 15bp deletion). Since the reads are randomly sampled from the underlying chromosome (haploid region), some of the reads will have the deletion or SNP towards the end of the read. The local alignment for each read is optimal, but incorrect given the total information from all reads (subtle point). You can see there are many reads that have spurious indels as well as SNPs. After local re-alignment, all but one of the reads now agree on the indel and SNP, with no spurious SNPs or indels.

                I can find examples for heterozygous SNPs/indels where the allele frequency between the alleles is moved towards 50/50 (normal diploid regions). All this is in my manuscript justifying this type of tool. This type of cleanup can make life a lot easier for SNP/indel callers.
                Attached Files

                Comment

                • Fabrice ODEFREY
                  Member
                  • May 2010
                  • 21

                  #9
                  thanks for your feedbacks it is all very interesting. I will definitely share our experience with bioscope as well as other aligners, once we have had a few analysis done with it.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  48 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  107 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  125 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...