Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa:how to align color space reads

    Hi,everybody~
    This puzzled me for days : I tried to use bwa on SOLiD seq results. But when I finished the manual, couldn't find a in-detail workflow about color space reads alignment. According to some post, I took these steps below:
    1 solid2fastq: used the script in the bwa suite(color to double encoded:ACGTN);
    2 index the fasta reference,with -c ;
    3 bwa aln;
    4 bwa samse (my SOLiD reads is fragment library)
    5 parse sam , and I found all the beads were Unmapped,But then I used same reads & reference with other tools,such as bioscope , bFast . The results are just fine , thousands of mapped reads.
    Then I tried with color space fastq(which means the sequence line is consisted of 1234.), All reads unmapped too~
    Maybe this workflow is not suitable? Could anyone please show me how to deal with color space reads with bwa?
    Many thanks!

  • #2
    I see you used the -c flag to indicate color-space whilst generating the reference database but did you also use the -c flag with the bwa aln command?

    e.g.

    bwa aln -c -f <sai output> <ref> <fastq input>

    Both the indexing and the aligning require the -c flag. bwa samse, in contrast, does not.


    Incidentally, unfortunately as of release 0.6, BWA has dropped color-space support (although the online documentation makes no mention of this) so BWA may no longer be the best mapper to invest time in for the longer term. This is unfortunate given it's usefulness

    Comment


    • #3
      Originally posted by NestorNotabilis View Post
      Incidentally, unfortunately as of release 0.6, BWA has dropped color-space support (although the online documentation makes no mention of this) so BWA may no longer be the best mapper to invest time in for the longer term. This is unfortunate given it's usefulness
      But this is mentioned in the NEWS file of the release.

      Release 0.6.1 (28 November, 2011)
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      Notable changes to BWA-short:

      * Bugfix: duplicated alternative hits in the XA tag.

      * Bugfix: when trimming enabled, bwa-aln trims 1bp less.

      * Disabled the color-space alignment. 0.6.x is not working with SOLiD reads at
      present.


      Which is a timely reminder to read all the documentation, and not just what is on potentially infrequently updated web pages

      Comment


      • #4
        Hi everyone.As we know Bowtie is a software in which we need edit. I want to know if there is a software we don't need edit to map reads to map billions of short reads onto genomes. ThanK you

        Comment


        • #5
          Originally posted by NestorNotabilis View Post
          I see you used the -c flag to indicate color-space whilst generating the reference database but did you also use the -c flag with the bwa aln command?

          e.g.

          bwa aln -c -f <sai output> <ref> <fastq input>

          Both the indexing and the aligning require the -c flag. bwa samse, in contrast, does not.


          Incidentally, unfortunately as of release 0.6, BWA has dropped color-space support (although the online documentation makes no mention of this) so BWA may no longer be the best mapper to invest time in for the longer term. This is unfortunate given it's usefulness
          Thanks for yr help! Actually, I used the -c ,even tried -n 3 or -n4 when proceed bwa aln.Sorry for forget to mention it~
          I checked my bwa version, it's 0.6.1, maybe here is the reason,what a shame~

          Comment


          • #6
            Originally posted by Bukowski View Post
            But this is mentioned in the NEWS file of the release.

            Release 0.6.1 (28 November, 2011)
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

            Notable changes to BWA-short:

            * Bugfix: duplicated alternative hits in the XA tag.

            * Bugfix: when trimming enabled, bwa-aln trims 1bp less.

            * Disabled the color-space alignment. 0.6.x is not working with SOLiD reads at
            present.


            Which is a timely reminder to read all the documentation, and not just what is on potentially infrequently updated web pages
            Thanks for the info. I indeed Not noticed there's a NEWS file~ shoot,My fault!

            Comment


            • #7
              I'm not too sure bwa was working too well with colour space data.

              Just a brief result from some trial 60bp exome data alignments to hg19 with default settings:

              bioscope - 85 % reads mapped (albeit with iterative read trimming)
              bwa ~ 40 %
              bowtie ~ 33%
              NovoalignCS ~59%

              Now I know there is a lot of optimisation to be done but the raw results are extremely diverse

              Comment


              • #8
                Originally posted by colindaven View Post
                I'm not too sure bwa was working too well with colour space data.

                Just a brief result from some trial 60bp exome data alignments to hg19 with default settings:

                bioscope - 85 % reads mapped (albeit with iterative read trimming)
                bwa ~ 40 %
                bowtie ~ 33%
                NovoalignCS ~59%

                Now I know there is a lot of optimisation to be done but the raw results are extremely diverse
                emm~ Me too,bioscope can always get obvious higher map rate, I doubt maybe it contains more false positive mapped reads

                Comment


                • #9
                  Be careful, bioscope/lifescope can be misleading on the mapping rate if you're not careful what you look at. If you look at the main summary, it always seems really high. But look at the SAM file it generates and do your own calculation. Most of the time, it maps about half as much as BWA does. Lifescope does give the 'real' stats but you have to dig much deeper to get it - it's highly misleading.

                  Comment


                  • #10
                    Originally posted by kexin View Post
                    Hi everyone.As we know Bowtie is a software in which we need edit. I want to know if there is a software we don't need edit to map reads to map billions of short reads onto genomes. ThanK you
                    What do you mean by "edit"?
                    As far as I know, in my opinion, bowtie is the easiest to use among all the align tools I have used

                    Comment


                    • #11
                      Solid mapping

                      Originally posted by kbhit View Post
                      Be careful, bioscope/lifescope can be misleading on the mapping rate if you're not careful what you look at. If you look at the main summary, it always seems really high. But look at the SAM file it generates and do your own calculation. Most of the time, it maps about half as much as BWA does. Lifescope does give the 'real' stats but you have to dig much deeper to get it - it's highly misleading.
                      KBhit- Do you mind elaborating on this? I have searched and searched for a better way to map Solid data. When we use Lifescope compared to something like Bowtie, its a difference of 90% and 60%. No one seems to be getting better than 60% mappability with Solid Colorspace, and Lifescope always reports higher. Do you believe Lifescope is misrepresenting it's metrics somehow?

                      Does anyone have suggestions for the best way to Map Solid data without tossing tons of reads?

                      Comment


                      • #12
                        Hi Jeremy,
                        I found that the stats that Lifescope can be misleading. Instead, when I compare it with other aligners like BWA and Shrimp (I like Shrimp2 a lot), I calculate the Lifescope mapping percentage manually. To do this I use (uniquely mapped reads / total starting reads ). In order to get the numerator, I look at the raw SAM file that Lifescope produces to get that value (rather than looking at their automatic report).

                        Something like:

                        cat <Lifescope's output sam file> |
                        grep -v "^@.. " | # remove headers
                        awk '{if (and($2, 4) == 0) print}' | # mapped
                        wc -l | # get the total count

                        I can't remember off hand but you may want to remove the ones with mapping qualities of 0.

                        If you need more information please let me know and I'll dig a little more

                        Comment


                        • #13
                          Thanks kbhit! I'll have to take a look at this.

                          Do you mind if I ask what you are getting for mappability using Shrimp2?

                          I appreciate your input. Finding a proper pipeline for Solid data is becoming a daunting task. If we use Lifescope (and we havent looked thoroughly), our Bioinformaticians' initial thoughts are similar to yours, and/or they believe that it is low quality mapping. If we use something like Bowtie, it brings are mapping to 65% and below. That's a lot of wasted reads that potentially could be meaningful data. With Wildfire data we are down in the 40's with Bowtie, and Lifescope is still almost 90%.

                          Comment


                          • #14
                            Hi Jeremy,
                            We normally get about 55% mappability on good quality long RNA using Shrimp2. Prior to calling Shrimp2 I use the latest version of cutadapt to do quality trimming (q of 15 normally) - this helps boost the quality. For us, when compared to Lifescope (calculated manually), Shrimp unusually performs better with regards to uniq mapping percentage.

                            Also, for COLORSPACE, be careful when using BWA & Bowtie, they don't handle color space correctly (which might be why there latest versions may be abandoning support for it). It's definitely trickier to handle color-space and it takes more brainpower to get it right. For example, they aren't able to work with the first and last nt of the read which lowers specificity. Crossover handling can also be problematic there. Shrimp and Lifescope don't' have those problems.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            9 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X