Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA-MEM and X0/X1/etc flags

    Hey everyone,

    I'm currently running bwa mem on some 454 single-end reads I have. My past experience with the bwa/samtools setup was with Illumina paired-end reads and using bwa aln/sampe (different projects). After looking at some of my first output for the bwa mem algorithm, I noticed there are no X0/X1 flags, which in the sampe output represent qualities such as number of best hits and number of suboptimal hits, respectively. Looking at the description of the bwa mem algorithm though on the bwa webpage (http://bio-bwa.sourceforge.net/bwa.shtml), it sounds like maybe the mem algorithm will only produce exact/unique matches, such that the X0/X1 flags are no longer needed (MEM -- maximal exact matches)? I'm interested in knowing this since I'd like to filter reads based on whether they have matches to multiple locations in the genome, and the X0/X1 tags were useful for doing this previously.

    Has anyone played around with bwa mem yet and/or can comment on whether they still got X0/X1 flags when using it?

    Thanks!
    Last edited by mturchin20; 05-20-2013, 03:18 PM.

  • #2
    So I've looked into this a little further and thought I'd just put up what I'm currently thinking in case anyone has something else to contribute.

    It looks like while the bwa samse/sampe produces X0/X1 tags, the bwa mem/bwa bwasw do not produce such tags. Instead (and now just specifically referring to bwa mem), it looks like the XP tag is used to identify both 1) long reads that have split alignments and 2) reads that have multiple alignments overall. In the former situation (of a long read being split), both reads will have a XP tag. Using the -M flag will cause the shorter read of the two to be flagged as "the alignment is not primary" in the bitwise FLAG field (column 2). Not including the -M flag will leave both reads as appearing 'primary' (which I'm guessing is what causes the issues with Picard/GATK that the documentation mentions?). In the latter situation (of a read having multiple hits overall), you can see the other sequence hits using the -a flag. In these situations, the 'primary' read does not contain a XP tag but the sub-primary reads do.

    Given this situation, I still think it isn't straight forward to determine whether a read is 'uniquely mapped' or not. I've read about using the MAPQ value as a way to determine this, e.g. a MAPQ of 0 means the read is not uniquely mapped, but I feel like I see examples where the MAPQ is >0 and there are multiple alignments for that read. For example:

    HXBHIOF01CJXWW 0 10 69647562 22 23M1D5M2I39M * 0 0 AACCTCAGGTGATCCGTCCGCCTGGCCTGCCCCAAAGTGCTGGGCTTACAGGCATGTGCCACCTTGCCC EE???:5332=CCC==<=>B@@><;5557444244;??CCC;;;;AED???EFECED>==?;:766854 NM:i:5 AS:i:46 XS:i:36
    HXBHIOF01CJXWW 256 X 94956752 0 23M1D5M2I33M6S * 0 0 * * NM:i:5 AS:i:36 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
    HXBHIOF01CJXWW 272 19 16193041 0 39M2I5M1D23M * 0 0 * * NM:i:7 AS:i:36 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
    HXBHIOF01CJXWW 256 3 169612909 0 23M1D5M2I39M * 0 0 * * NM:i:8 AS:i:35 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
    HXBHIOF01CJXWW 256 12 122144914 0 22M1D6M2I26M13S * 0 0 * * NM:i:4 AS:i:34 XP:Z:10,+69647562,23M1D5M2I39M,22,5;

    The first read has a MAPQ of 22 even though it maps to other regions in the genome (all with MAPQ of 0, admittedly). Additionally this is not a long read that has been split (excluding the -a flag only produces the first read).

    So I guess my current thinking is that I should still use a MAPQ threshold (10 or 30?), but that this may not produce a full set of 'uniquely mapping' reads?

    Comment


    • #3
      Can you use the difference between the AS tag and the XS tag?
      I can't see the XS tag in your output, it may come with the newest version.

      Comment


      • #4
        The "SA" tag is now used (v0.7.5) to identify reads which have chimeric alignment:
        https://github.com/lh3/bwa/blob/master/NEWS

        Comment


        • #5
          bam tags AS and XS

          A comment to how uniquely mapped reads can be extracted. As far as I understand a read is uniquely mapped when AS and XS tags differ - pls correct me if im wrong. Actually I have seen XS values being bigger than AS values which I dont really understand - Cant find a thorough description of what XS orhter than its the score for a suboptimal alignment.

          ex with XS bigger than AS:
          HWUSI-EAS664L:24:64FGCAAXX:4:11:3337:6807 99 gi|407955691|dbj|AP012495.1| 1359455 0 101M = 1359704 351 CCAATGGCCGCTGCCAGTGGCGTTTTGTCGTGTCTTTCCGGGTTGGACTCAAGTTGATAGTTACCGGA
          TAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTC IIHGDIIIIGIIDIIIIEHIIIIGIIGHIIIIHHHIIIIIHIEIIIHIIIHGHIIIFGHIFIHHHIICIGHHHIIIBCGHGEGHGGDGHBEGGIGEFEAD? NM:i:8 MD:Z:3G4T13A1A0A10A16A0C46 AS:i:62 XS:i:68

          So is this an example of a uniquely mapped read ?

          Comment


          • #6
            Originally posted by Kennels View Post
            The "SA" tag is now used (v0.7.5) to identify reads which have chimeric alignment:
            https://github.com/lh3/bwa/blob/master/NEWS
            hello, I dig up this post because of your comment on the SA: tag
            could you please elaborate, the provided link is broken.
            the main question is:
            are the reads with the SA: code still unique? what about the ones that also incorporate the XA: code?
            Here are some examples
            Code:
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 256     chr2    161581931       0       32M17H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        CAC<E?E<E????DDF@DBEB<D?9??D99D<        NM:i:0  MD:Z:32 AS:i:32 XS:i:20 SA:Z:chr12,87311999,-,32M17S,0,0;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1212:1519:12286 256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        .):639=>?>?8??<)7;7)=8=8)0>8>?.6        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:10843:5627 256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        B?@AB@AABBBBABBB<ABBB?BAB?BB?BBB        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360        256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        B?BBBBBBBB=ABBBBBBB?A?ABB0=A?BBB        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 0       chr11   22585338        0       18S33M  *       0       0       TTTTCCCATTCTGTAGATTGTCTGTTTACTCTGATTATAGTTTATTTTGCT     IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHFGIHIIIIFEIGHEGHCHF     NM:i:0  MD:Z:33 AS:i:33 XS:i:33 SA:Z:chr17,59076122,+,33M18S,0,0;       XA:Z:chr3,+175713108,51M,4;chr2,-116989725,16S35M,1;chr19,+21698327,7S29M15S,0;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1212:1519:12286 16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     ??>?9@?=.:().))?>.)6.?>8>0)8=8=)7;7)<??8?>?>=936:).     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;
                   XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:10843:5627 16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     BBBBBBABBABABABBBBBBBB?BB?BAB?BBBA<BBBABBBBAA@BA@?B     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;
                   XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360        16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     BBBBBBABB>A<B>B>A?7BBB?A=0BBA?A?BBBBBBBA=BBBBBBBB?B     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;       XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 16      chr12   87311999        0       32M17S  *       0       0       TATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA       AA==5.DB8BB8=/??*<D99D??9?D<BEBD@FDD????E<E?E<CAC       NM:i:0  MD:Z:32 AS:i:32 XS:i:30 SA:Z:chr2,161581931,+,32M17S,0,0;       XA:Z:chr9,-104599381,49M,4;chr3,+170653467,49M,4;chr12,+46991828,49M,5;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 256     chr17   59076122        0       33M18H  *       0       0       TTTTCCCATTCTGTAGATTGTCTGTTTACTCTG       IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHF       NM:i:0  MD:Z:33 AS:i:33 XS:i:32 SA:Z:chr11,22585338,+,18S33M,0,0;       XA:Z:chr4,-151801895,19S32M,0;

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            69 views
            0 likes
            Last Post seqadmin  
            Working...
            X