Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapping SOLiD colorspace paired end reads

    Hi, I'm trying to get a sense of what the current consensus is regarding the best practice for mapping SOLiD colour-space paired-end reads. As far as I can tell the options appear to be rather limited:

    1. LifeScope.

    Advantages: designed specifically for paired-end colour-space reads and maps the most (which may also be a disadvantage… ). Disadvantages: Very slow and very resource hungry. Overly complicated command line interface. Can only run on a limited range of hardware/software. Difficult to leverage in to GATK. And to rub salt into the wound, AB are planning to charge for it in the near future.

    2. Bowtie.

    Advantages: Fast. Resource efficient. Multithreaded. Easy to use command line interface. Disadvantages: Tricky to leverage into GATK. Most importantly, Bowtie cannot accommodate indels. And Bowtie2 will not accept colour-space

    3. BWA

    Advantages: Fast. Efficient. Multithreaded. Accommodates indels . Easy to use interface. Easily leveraged into GATK pipeline (to a point - see below). BUT! NO current support for paired-end SOLiD data (Mate-pair yes, but not, it would seem, paired-end) - current workaround would appear to be to reverse F5 reads (and associated QVs), preferably trimming heavily at the 3' end to minimise potential problem of reversed error profile issues (though how much of an issue is this?). Creates all sorts of issues when leveraging into GATK (particularly as BWA does not include colour-space data in BAM, a prerequisite for GATK recalibration).

    4. BFAST

    Am still exploring this option -however would appear to accept paired-end colour-space data natively and can be leveraged into GATK pipelines but interface is a bit challenging, documentation a bit opaque, and parts of the process can be very slow.

    Is there anything I've missed out? What are people's preferred strategy, particularly if they want to leverage into GATK for recalibration purposes? Are we all going to be using ECC generated base-space data so development in colorspace-compatible tools will dry up? Any thoughts on this subject would be most welcome!

  • #2
    We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.

    A few notes-

    1. Lifescope will be 10000 euros a year from what I've heard.

    2. I wouldn't use Bowtie on genomic data, just for transcriptomes.

    3. There is some debate on the BWA mailing list on them dropping colour space altogether.

    Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-

    Comment


    • #3
      Originally posted by colindaven View Post
      We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.

      A few notes-

      1. Lifescope will be 10000 euros a year from what I've heard.

      2. I wouldn't use Bowtie on genomic data, just for transcriptomes.

      3. There is some debate on the BWA mailing list on them dropping colour space altogether.

      Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-

      Good point about SHRiMP2 - yes it does appear to have a paired-end option and I'll check it out - thanks for pointing that out (my memory of using Shrimp before is that it needed a lot of compute resources - specifically memory - it'll be interesting too see how it copes with paired-end data ).

      Am disappointed to hear that BWA may be dropping colour-space support altogether - though not entirely surprised - it has always appeared to perform particularly poorly with colour-space data data and I see there is some debate that this may be a flaw in their colour-space implementation. Further, with ECC technology now available with the 5500xl I wonder how many other developers might also drop colour-space support?

      I'm curious to know why you would avoid using Bowtie on WGS data - is it purely because of the indel issue? I've found bowtie mappings tend to give marginally better mappings than BWA - although the resultant BAM files are problematic when it comes to using GATK pipeline. I'm guessing you're suggesting it's suitability for RNA-seq because of TopHat?

      Lastly - yes... 10,000 euros a year for Lifescope. It is worth it? Hmm. Discuss!!! (I suspect the few of us who are ACTUALLY using LifeScope will seriously consider spending that money on alternative commercial options rather than fork out on what is a very fragile and poorly designed piece of software - however, it's fair to say it's mapper does do a good job of making the most of colour-space data... at least in terms of coverage...).

      Will look forward to what you have to say regarding Novoalign. Has anyone else any experience of using this mapper with SOLiD paired-end data?

      Comment


      • #4
        With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
        Researchers use Applied Biosystems integrated systems for sequencing, flow cytometry, and real-time, digital and end point PCR—from sample prep to data analysis.

        Comment


        • #5
          Originally posted by h2karen View Post
          With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
          http://solid.community.appliedbiosys...om/thread/1182
          Many thanks for the info - and yes I am aware of that issue - I'm afraid I wasn't being clear in the point that I was making, which is that, because ECC is now available, I fear that less effort will be made by third party developers to produce and maintain software capable of coping with colour-space data in addition to base-space. It is notable that Bowtie 2 cannot use colour-space even though Bowtie1 can and, it would seem from the above comment, BWA may also pull out of supporting colour-space. As far as using SOLiD paired-end data is concerned is this a bit of a double whammy: users can't use paired-end in base-space because of the point you rightly highlight, but if they use colour-space there are few (and, over time, it would seem fewer) options available to map the data easily.

          Comment


          • #6
            1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.

            2) I think the new version of BWA also stopped supporting colour space.

            Comment


            • #7
              Originally posted by Zaag View Post
              1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.

              2) I think the new version of BWA also stopped supporting colour space.
              Thanks for that! I've checked and you're right - as of release 0.6, BWA has dropped colour-space support (although the online documentation still alludes to it). This is very disappointing, though given what I've been hearing, not unexpected

              Have checked Lifescope 2.5 output with GATK, and yes, it appears to work fine with GATK. This is very good news, so thanks for bringing that to my attention!

              Comment


              • #8
                SHRiMP 2.2.2 does seem to be an alternative.

                I can align ~60m SE 60bp exome reads to the human genome in about 10 hours using 47 threads. That gives it about a third of the runtime of novoalign-CS on this machine.

                We will be testing SNP calls from alignments in the next few weeks so I can't say anything yet.

                Comment


                • #9
                  For the sake of completeness there is PerM, which does handle paired end reads (as separate files). I stopped using it as it ignores any read containing a 'N'. It is not a gapped aligner. It is still being developed.

                  Comment


                  • #10
                    GATK does work on SHRIMP2 produced SAM files from SOLID pair-end reads. Here are the steps:
                    align with SHRIMP2 with the --single-best-mapping and --all-contigs flags.
                    use picard to fix the Read Group
                    run GATK.

                    Comment


                    • #11
                      Hi!

                      I'm analyzing a "second-hand" dataset generated using SOLiD 4. It is a transcriptome mate pair library that is 52 x 37 nt, and I cannot for the sake of me find the protocol that was used to generate those specific read lengths. I have F3 and R3 reads, so I am assuming it is a circularization protocol, but I do not know what the size selection parameters were, or how the circles were cut to produce the final fragments. This info would be very valuable for a more accurate mapping.

                      Any knowledge would be greatly appreciated!

                      Thanks a lot,

                      Carmen

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-27-2024, 06:37 PM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-27-2024, 06:07 PM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      69 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X