Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Heisman
    Senior Member
    • Dec 2010
    • 534

    How to align SOLiD data?

    Hey all,

    I've only ever looked at Illumina data, and we are being given exome data done with SOLiD sequencing. I am curious as to opinions regarding the optimal way to align this data. I figure once I get it into SAM format I can do what I normally do. Would NovoalignCS be a good option? I do not have a licensed version but I am not in a rush and can probably get by with the free downloadable version. Any other opinions? This is a one time thing to my knowledge so I just want to find something that will work and implement: knowing all of the caveats of each method is not important to me in this case. Thanks a bunch!
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    If time is not a constraint, and you want something easy, then use NovoalignCS.

    Comment

    • Heisman
      Senior Member
      • Dec 2010
      • 534

      #3
      Originally posted by nilshomer View Post
      If time is not a constraint, and you want something easy, then use NovoalignCS.
      Alright, sounds good. Thanks!

      Comment

      • Richard Finney
        Senior Member
        • Feb 2009
        • 701

        #4
        Bwa

        BWA is good, also. I get very good results with SOLID data.

        Beware that you must provide the right "color space" parameters for both indexing the genome (the very first step) and during the alignment process. There's also a seperate "csfasta to fastq" step. The BWA package contains a perl script to do this. Beware,, also, that SOLID is "transition based", so when you get a bad nucleotide, the rest of the read is bad. BWA "clips" the read.

        Comment

        • Heisman
          Senior Member
          • Dec 2010
          • 534

          #5
          Alright, alright, now I'm intrigued.

          I had no idea there are color space parameters. Could you (or anyone), provide a link to explain that aspect of this and how I need to incorporate it into the alignment? Also, how to determine it (ie, can I look at the data and figure it out or do I need to contact the people who ran the instrument)?

          Now that I'm curious, might as well do it right. Any links would be appreciated. Is BWA the predominantly used aligner for SOLiD data?

          Comment

          • gringer
            David Eccles (gringer)
            • May 2011
            • 845

            #6
            Bioscope/Lifescope is the aligner for SOLiD data that produces the most mapped reads (I think about 70-90%). The reads will often be end-clipped to find a match. Other programs seem to be phasing out colour-space support, and have much lower mapping proportions (about 30-60%).

            Comment


            • #7
              Novoalign is still an excellent choice if you had to pick a colourspace aligner.
              If this is not 5500 data, that is where I would start.

              Comment

              • kopi-o
                Senior Member
                • Feb 2008
                • 319

                #8
                richardfinney: I have been having trouble getting BWA to perform well on color-space data. It would be most appreciated if you could share the settings that you use.

                Comment

                • Heisman
                  Senior Member
                  • Dec 2010
                  • 534

                  #9
                  With NovoalignCS and setting t = 150, I'm uniquely aligning up to 50% of the read sequences, although only about 30-35% are aligning as pairs. I'm probably fine with this, and thanks for the help, everyone. That said, I'm not sure if there's an easy way from looking at the files to see if it's from the 5500, and I googled around for bio/lifescope and could not find a downloadable aligner anywhere.

                  Comment

                  • Richard Finney
                    Senior Member
                    • Feb 2009
                    • 701

                    #10
                    Illumina definitely maps more reads. A quality drop of one base in the middle of the read is still usable with Illumina. Because SOLID is transition based, you are lost if you "miss" a base pair. Bioscope (at least the free old one installed on BIOWULF at NIH) tries to align these anyway and they're often junk; you get these runt reads spread out all over the place. BWA will, I beleive it's called "soft clip", some of these reads; but most it will just assign as unmapped. The unmapped percentage of reads for SOLID BWA versus Illumina BWA is much higher. I'm not an expert and I can't rule out wetlab folks just not being as good as with SOLID samples but I ~suspect~ that the SOLID techniques are just plain trickier. Biocscope goes to great pains to try and hide this situation. BWA is better because it "files the unmapped reads into the- unamapped directory". However, the alignments that are good, look right, and the SNPS discovered make sense and can often enough be verified. It that sense SOLID is quite good and usable, just don't expect 97% mapping of reads.

                    I use no special parameters to BWA. I think they don't affect the results too much. The defaults are fine. If anyone knows better, please let us know hereabouts. The parameters to make BWA work with SOLID are documented in the BWA documentation. BWA also provides a perl script to convert SOLID CSFASTA/QUAL files to fastq for input into BWA. The target (i.e. genome) indexing for COLORSPACE/SOLID is not the same as for non-color space (e.g. Illumina).

                    Comment


                    • #11
                      Originally posted by Richard Finney View Post
                      Illumina definitely maps more reads. A quality drop of one base in the middle of the read is still usable with Illumina. Because SOLID is transition based, you are lost if you "miss" a base pair. Bioscope (at least the free old one installed on BIOWULF at NIH) tries to align these anyway and they're often junk; you get these runt reads spread out all over the place. BWA will, I beleive it's called "soft clip", some of these reads; but most it will just assign as unmapped. The unmapped percentage of reads for SOLID BWA versus Illumina BWA is much higher. I'm not an expert and I can't rule out wetlab folks just not being as good as with SOLID samples but I ~suspect~ that the SOLID techniques are just plain trickier. Biocscope goes to great pains to try and hide this situation. BWA is better because it "files the unmapped reads into the- unamapped directory". However, the alignments that are good, look right, and the SNPS discovered make sense and can often enough be verified. It that sense SOLID is quite good and usable, just don't expect 97% mapping of reads.

                      I use no special parameters to BWA. I think they don't affect the results too much. The defaults are fine. If anyone knows better, please let us know hereabouts. The parameters to make BWA work with SOLID are documented in the BWA documentation. BWA also provides a perl script to convert SOLID CSFASTA/QUAL files to fastq for input into BWA. The target (i.e. genome) indexing for COLORSPACE/SOLID is not the same as for non-color space (e.g. Illumina).
                      3-4 years of solid and people still don't get it. I know colorspace is the minority, but it can be managed relatively easily.
                      %mapping should not be used between platforms. Solid does minimal filtering prior to alignment, while Illumina does generous amounts of filtering. This leads to vastly different mapping percentages.

                      You are not lost if you miss a base as long as you are aligning in colourspace. Translating out of colourspace prior to alignment is not the proper way to handle cs data. Align with known CS tools, then use more common tools for the basespace output.
                      Last edited by Guest; 01-29-2012, 05:20 PM.

                      Comment

                      • Richard Finney
                        Senior Member
                        • Feb 2009
                        • 701

                        #12
                        Thanks. I'm no expert on SOLID/Colorspace and fine with being schooled on this. I'm just reporting my experiences with SOLID on bioscope and BWA. I actually don't know exactly why there are so many hard clipped runt reads using SOLID bioscope; I'm just guessing having stared at it too long. I still think the bias/noise/wackiness whatever with SOLID is manageable and the long reads that do map are good reads. It's is a perfectly usable system and I'm sure the SOLID folks are working at addressing any issues and improving their processes.

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                          Here are nine questions we think about, in roughly the order they matter, before...
                          Today, 07:11 AM
                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Yesterday, 06:09 AM
                        0 responses
                        16 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        37 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        42 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        49 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...