Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapping SOLiD colorspace paired end reads

    Hi, I'm trying to get a sense of what the current consensus is regarding the best practice for mapping SOLiD colour-space paired-end reads. As far as I can tell the options appear to be rather limited:

    1. LifeScope.

    Advantages: designed specifically for paired-end colour-space reads and maps the most (which may also be a disadvantage… ). Disadvantages: Very slow and very resource hungry. Overly complicated command line interface. Can only run on a limited range of hardware/software. Difficult to leverage in to GATK. And to rub salt into the wound, AB are planning to charge for it in the near future.

    2. Bowtie.

    Advantages: Fast. Resource efficient. Multithreaded. Easy to use command line interface. Disadvantages: Tricky to leverage into GATK. Most importantly, Bowtie cannot accommodate indels. And Bowtie2 will not accept colour-space

    3. BWA

    Advantages: Fast. Efficient. Multithreaded. Accommodates indels . Easy to use interface. Easily leveraged into GATK pipeline (to a point - see below). BUT! NO current support for paired-end SOLiD data (Mate-pair yes, but not, it would seem, paired-end) - current workaround would appear to be to reverse F5 reads (and associated QVs), preferably trimming heavily at the 3' end to minimise potential problem of reversed error profile issues (though how much of an issue is this?). Creates all sorts of issues when leveraging into GATK (particularly as BWA does not include colour-space data in BAM, a prerequisite for GATK recalibration).

    4. BFAST

    Am still exploring this option -however would appear to accept paired-end colour-space data natively and can be leveraged into GATK pipelines but interface is a bit challenging, documentation a bit opaque, and parts of the process can be very slow.

    Is there anything I've missed out? What are people's preferred strategy, particularly if they want to leverage into GATK for recalibration purposes? Are we all going to be using ECC generated base-space data so development in colorspace-compatible tools will dry up? Any thoughts on this subject would be most welcome!

  • #2
    We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.

    A few notes-

    1. Lifescope will be 10000 euros a year from what I've heard.

    2. I wouldn't use Bowtie on genomic data, just for transcriptomes.

    3. There is some debate on the BWA mailing list on them dropping colour space altogether.

    Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-

    Comment


    • #3
      Originally posted by colindaven View Post
      We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.

      A few notes-

      1. Lifescope will be 10000 euros a year from what I've heard.

      2. I wouldn't use Bowtie on genomic data, just for transcriptomes.

      3. There is some debate on the BWA mailing list on them dropping colour space altogether.

      Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-

      Good point about SHRiMP2 - yes it does appear to have a paired-end option and I'll check it out - thanks for pointing that out (my memory of using Shrimp before is that it needed a lot of compute resources - specifically memory - it'll be interesting too see how it copes with paired-end data ).

      Am disappointed to hear that BWA may be dropping colour-space support altogether - though not entirely surprised - it has always appeared to perform particularly poorly with colour-space data data and I see there is some debate that this may be a flaw in their colour-space implementation. Further, with ECC technology now available with the 5500xl I wonder how many other developers might also drop colour-space support?

      I'm curious to know why you would avoid using Bowtie on WGS data - is it purely because of the indel issue? I've found bowtie mappings tend to give marginally better mappings than BWA - although the resultant BAM files are problematic when it comes to using GATK pipeline. I'm guessing you're suggesting it's suitability for RNA-seq because of TopHat?

      Lastly - yes... 10,000 euros a year for Lifescope. It is worth it? Hmm. Discuss!!! (I suspect the few of us who are ACTUALLY using LifeScope will seriously consider spending that money on alternative commercial options rather than fork out on what is a very fragile and poorly designed piece of software - however, it's fair to say it's mapper does do a good job of making the most of colour-space data... at least in terms of coverage...).

      Will look forward to what you have to say regarding Novoalign. Has anyone else any experience of using this mapper with SOLiD paired-end data?

      Comment


      • #4
        With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
        With a comprehensive portfolio of products, Applied Biosystems solutions from Thermo Fisher Scientific empower you to address today’s most pressing genetic challenges.

        Comment


        • #5
          Originally posted by h2karen View Post
          With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
          http://solid.community.appliedbiosys...om/thread/1182
          Many thanks for the info - and yes I am aware of that issue - I'm afraid I wasn't being clear in the point that I was making, which is that, because ECC is now available, I fear that less effort will be made by third party developers to produce and maintain software capable of coping with colour-space data in addition to base-space. It is notable that Bowtie 2 cannot use colour-space even though Bowtie1 can and, it would seem from the above comment, BWA may also pull out of supporting colour-space. As far as using SOLiD paired-end data is concerned is this a bit of a double whammy: users can't use paired-end in base-space because of the point you rightly highlight, but if they use colour-space there are few (and, over time, it would seem fewer) options available to map the data easily.

          Comment


          • #6
            1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.

            2) I think the new version of BWA also stopped supporting colour space.

            Comment


            • #7
              Originally posted by Zaag View Post
              1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.

              2) I think the new version of BWA also stopped supporting colour space.
              Thanks for that! I've checked and you're right - as of release 0.6, BWA has dropped colour-space support (although the online documentation still alludes to it). This is very disappointing, though given what I've been hearing, not unexpected

              Have checked Lifescope 2.5 output with GATK, and yes, it appears to work fine with GATK. This is very good news, so thanks for bringing that to my attention!

              Comment


              • #8
                SHRiMP 2.2.2 does seem to be an alternative.

                I can align ~60m SE 60bp exome reads to the human genome in about 10 hours using 47 threads. That gives it about a third of the runtime of novoalign-CS on this machine.

                We will be testing SNP calls from alignments in the next few weeks so I can't say anything yet.

                Comment


                • #9
                  For the sake of completeness there is PerM, which does handle paired end reads (as separate files). I stopped using it as it ignores any read containing a 'N'. It is not a gapped aligner. It is still being developed.

                  Comment


                  • #10
                    GATK does work on SHRIMP2 produced SAM files from SOLID pair-end reads. Here are the steps:
                    align with SHRIMP2 with the --single-best-mapping and --all-contigs flags.
                    use picard to fix the Read Group
                    run GATK.

                    Comment


                    • #11
                      Hi!

                      I'm analyzing a "second-hand" dataset generated using SOLiD 4. It is a transcriptome mate pair library that is 52 x 37 nt, and I cannot for the sake of me find the protocol that was used to generate those specific read lengths. I have F3 and R3 reads, so I am assuming it is a circularization protocol, but I do not know what the size selection parameters were, or how the circles were cut to produce the final fragments. This info would be very valuable for a more accurate mapping.

                      Any knowledge would be greatly appreciated!

                      Thanks a lot,

                      Carmen

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Non-Coding RNA Research and Technologies
                        by seqadmin




                        Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                        Nobel Prize for MicroRNA Discovery
                        This week,...
                        10-07-2024, 08:07 AM
                      • seqadmin
                        Recent Developments in Metagenomics
                        by seqadmin





                        Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                        09-23-2024, 06:35 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 10-11-2024, 06:55 AM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-02-2024, 04:51 AM
                      0 responses
                      110 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-01-2024, 07:10 AM
                      0 responses
                      114 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-30-2024, 08:33 AM
                      1 response
                      120 views
                      0 likes
                      Last Post EmiTom
                      by EmiTom
                       
                      Working...
                      X