Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapping SOLiD colorspace paired end reads

    Hi, I'm trying to get a sense of what the current consensus is regarding the best practice for mapping SOLiD colour-space paired-end reads. As far as I can tell the options appear to be rather limited:

    1. LifeScope.

    Advantages: designed specifically for paired-end colour-space reads and maps the most (which may also be a disadvantage… ). Disadvantages: Very slow and very resource hungry. Overly complicated command line interface. Can only run on a limited range of hardware/software. Difficult to leverage in to GATK. And to rub salt into the wound, AB are planning to charge for it in the near future.

    2. Bowtie.

    Advantages: Fast. Resource efficient. Multithreaded. Easy to use command line interface. Disadvantages: Tricky to leverage into GATK. Most importantly, Bowtie cannot accommodate indels. And Bowtie2 will not accept colour-space

    3. BWA

    Advantages: Fast. Efficient. Multithreaded. Accommodates indels . Easy to use interface. Easily leveraged into GATK pipeline (to a point - see below). BUT! NO current support for paired-end SOLiD data (Mate-pair yes, but not, it would seem, paired-end) - current workaround would appear to be to reverse F5 reads (and associated QVs), preferably trimming heavily at the 3' end to minimise potential problem of reversed error profile issues (though how much of an issue is this?). Creates all sorts of issues when leveraging into GATK (particularly as BWA does not include colour-space data in BAM, a prerequisite for GATK recalibration).

    4. BFAST

    Am still exploring this option -however would appear to accept paired-end colour-space data natively and can be leveraged into GATK pipelines but interface is a bit challenging, documentation a bit opaque, and parts of the process can be very slow.

    Is there anything I've missed out? What are people's preferred strategy, particularly if they want to leverage into GATK for recalibration purposes? Are we all going to be using ECC generated base-space data so development in colorspace-compatible tools will dry up? Any thoughts on this subject would be most welcome!

  • #2
    We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.

    A few notes-

    1. Lifescope will be 10000 euros a year from what I've heard.

    2. I wouldn't use Bowtie on genomic data, just for transcriptomes.

    3. There is some debate on the BWA mailing list on them dropping colour space altogether.

    Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-

    Comment


    • #3
      Originally posted by colindaven View Post
      We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.

      A few notes-

      1. Lifescope will be 10000 euros a year from what I've heard.

      2. I wouldn't use Bowtie on genomic data, just for transcriptomes.

      3. There is some debate on the BWA mailing list on them dropping colour space altogether.

      Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-

      Good point about SHRiMP2 - yes it does appear to have a paired-end option and I'll check it out - thanks for pointing that out (my memory of using Shrimp before is that it needed a lot of compute resources - specifically memory - it'll be interesting too see how it copes with paired-end data ).

      Am disappointed to hear that BWA may be dropping colour-space support altogether - though not entirely surprised - it has always appeared to perform particularly poorly with colour-space data data and I see there is some debate that this may be a flaw in their colour-space implementation. Further, with ECC technology now available with the 5500xl I wonder how many other developers might also drop colour-space support?

      I'm curious to know why you would avoid using Bowtie on WGS data - is it purely because of the indel issue? I've found bowtie mappings tend to give marginally better mappings than BWA - although the resultant BAM files are problematic when it comes to using GATK pipeline. I'm guessing you're suggesting it's suitability for RNA-seq because of TopHat?

      Lastly - yes... 10,000 euros a year for Lifescope. It is worth it? Hmm. Discuss!!! (I suspect the few of us who are ACTUALLY using LifeScope will seriously consider spending that money on alternative commercial options rather than fork out on what is a very fragile and poorly designed piece of software - however, it's fair to say it's mapper does do a good job of making the most of colour-space data... at least in terms of coverage...).

      Will look forward to what you have to say regarding Novoalign. Has anyone else any experience of using this mapper with SOLiD paired-end data?

      Comment


      • #4
        With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
        With a comprehensive portfolio of products, Applied Biosystems solutions from Thermo Fisher Scientific empower you to address today’s most pressing genetic challenges.

        Comment


        • #5
          Originally posted by h2karen View Post
          With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
          http://solid.community.appliedbiosys...om/thread/1182
          Many thanks for the info - and yes I am aware of that issue - I'm afraid I wasn't being clear in the point that I was making, which is that, because ECC is now available, I fear that less effort will be made by third party developers to produce and maintain software capable of coping with colour-space data in addition to base-space. It is notable that Bowtie 2 cannot use colour-space even though Bowtie1 can and, it would seem from the above comment, BWA may also pull out of supporting colour-space. As far as using SOLiD paired-end data is concerned is this a bit of a double whammy: users can't use paired-end in base-space because of the point you rightly highlight, but if they use colour-space there are few (and, over time, it would seem fewer) options available to map the data easily.

          Comment


          • #6
            1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.

            2) I think the new version of BWA also stopped supporting colour space.

            Comment


            • #7
              Originally posted by Zaag View Post
              1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.

              2) I think the new version of BWA also stopped supporting colour space.
              Thanks for that! I've checked and you're right - as of release 0.6, BWA has dropped colour-space support (although the online documentation still alludes to it). This is very disappointing, though given what I've been hearing, not unexpected

              Have checked Lifescope 2.5 output with GATK, and yes, it appears to work fine with GATK. This is very good news, so thanks for bringing that to my attention!

              Comment


              • #8
                SHRiMP 2.2.2 does seem to be an alternative.

                I can align ~60m SE 60bp exome reads to the human genome in about 10 hours using 47 threads. That gives it about a third of the runtime of novoalign-CS on this machine.

                We will be testing SNP calls from alignments in the next few weeks so I can't say anything yet.

                Comment


                • #9
                  For the sake of completeness there is PerM, which does handle paired end reads (as separate files). I stopped using it as it ignores any read containing a 'N'. It is not a gapped aligner. It is still being developed.

                  Comment


                  • #10
                    GATK does work on SHRIMP2 produced SAM files from SOLID pair-end reads. Here are the steps:
                    align with SHRIMP2 with the --single-best-mapping and --all-contigs flags.
                    use picard to fix the Read Group
                    run GATK.

                    Comment


                    • #11
                      Hi!

                      I'm analyzing a "second-hand" dataset generated using SOLiD 4. It is a transcriptome mate pair library that is 52 x 37 nt, and I cannot for the sake of me find the protocol that was used to generate those specific read lengths. I have F3 and R3 reads, so I am assuming it is a circularization protocol, but I do not know what the size selection parameters were, or how the circles were cut to produce the final fragments. This info would be very valuable for a more accurate mapping.

                      Any knowledge would be greatly appreciated!

                      Thanks a lot,

                      Carmen

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Recent Advances in Sequencing Technologies
                        by seqadmin







                        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                        Long-Read Sequencing
                        Long-read sequencing has...
                        Yesterday, 01:49 PM
                      • seqadmin
                        Genetic Variation in Immunogenetics and Antibody Diversity
                        by seqadmin



                        The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                        11-06-2024, 07:24 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 09:29 AM
                      0 responses
                      81 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 09:06 AM
                      0 responses
                      40 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 08:03 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 11-22-2024, 07:36 AM
                      0 responses
                      65 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X