Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA-MEM and X0/X1/etc flags

    Hey everyone,

    I'm currently running bwa mem on some 454 single-end reads I have. My past experience with the bwa/samtools setup was with Illumina paired-end reads and using bwa aln/sampe (different projects). After looking at some of my first output for the bwa mem algorithm, I noticed there are no X0/X1 flags, which in the sampe output represent qualities such as number of best hits and number of suboptimal hits, respectively. Looking at the description of the bwa mem algorithm though on the bwa webpage (http://bio-bwa.sourceforge.net/bwa.shtml), it sounds like maybe the mem algorithm will only produce exact/unique matches, such that the X0/X1 flags are no longer needed (MEM -- maximal exact matches)? I'm interested in knowing this since I'd like to filter reads based on whether they have matches to multiple locations in the genome, and the X0/X1 tags were useful for doing this previously.

    Has anyone played around with bwa mem yet and/or can comment on whether they still got X0/X1 flags when using it?

    Thanks!
    Last edited by mturchin20; 05-20-2013, 03:18 PM.

  • #2
    So I've looked into this a little further and thought I'd just put up what I'm currently thinking in case anyone has something else to contribute.

    It looks like while the bwa samse/sampe produces X0/X1 tags, the bwa mem/bwa bwasw do not produce such tags. Instead (and now just specifically referring to bwa mem), it looks like the XP tag is used to identify both 1) long reads that have split alignments and 2) reads that have multiple alignments overall. In the former situation (of a long read being split), both reads will have a XP tag. Using the -M flag will cause the shorter read of the two to be flagged as "the alignment is not primary" in the bitwise FLAG field (column 2). Not including the -M flag will leave both reads as appearing 'primary' (which I'm guessing is what causes the issues with Picard/GATK that the documentation mentions?). In the latter situation (of a read having multiple hits overall), you can see the other sequence hits using the -a flag. In these situations, the 'primary' read does not contain a XP tag but the sub-primary reads do.

    Given this situation, I still think it isn't straight forward to determine whether a read is 'uniquely mapped' or not. I've read about using the MAPQ value as a way to determine this, e.g. a MAPQ of 0 means the read is not uniquely mapped, but I feel like I see examples where the MAPQ is >0 and there are multiple alignments for that read. For example:

    HXBHIOF01CJXWW 0 10 69647562 22 23M1D5M2I39M * 0 0 AACCTCAGGTGATCCGTCCGCCTGGCCTGCCCCAAAGTGCTGGGCTTACAGGCATGTGCCACCTTGCCC EE???:5332=CCC==<=>B@@><;5557444244;??CCC;;;;AED???EFECED>==?;:766854 NM:i:5 AS:i:46 XS:i:36
    HXBHIOF01CJXWW 256 X 94956752 0 23M1D5M2I33M6S * 0 0 * * NM:i:5 AS:i:36 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
    HXBHIOF01CJXWW 272 19 16193041 0 39M2I5M1D23M * 0 0 * * NM:i:7 AS:i:36 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
    HXBHIOF01CJXWW 256 3 169612909 0 23M1D5M2I39M * 0 0 * * NM:i:8 AS:i:35 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
    HXBHIOF01CJXWW 256 12 122144914 0 22M1D6M2I26M13S * 0 0 * * NM:i:4 AS:i:34 XP:Z:10,+69647562,23M1D5M2I39M,22,5;

    The first read has a MAPQ of 22 even though it maps to other regions in the genome (all with MAPQ of 0, admittedly). Additionally this is not a long read that has been split (excluding the -a flag only produces the first read).

    So I guess my current thinking is that I should still use a MAPQ threshold (10 or 30?), but that this may not produce a full set of 'uniquely mapping' reads?

    Comment


    • #3
      Can you use the difference between the AS tag and the XS tag?
      I can't see the XS tag in your output, it may come with the newest version.

      Comment


      • #4
        The "SA" tag is now used (v0.7.5) to identify reads which have chimeric alignment:
        https://github.com/lh3/bwa/blob/master/NEWS

        Comment


        • #5
          bam tags AS and XS

          A comment to how uniquely mapped reads can be extracted. As far as I understand a read is uniquely mapped when AS and XS tags differ - pls correct me if im wrong. Actually I have seen XS values being bigger than AS values which I dont really understand - Cant find a thorough description of what XS orhter than its the score for a suboptimal alignment.

          ex with XS bigger than AS:
          HWUSI-EAS664L:24:64FGCAAXX:4:11:3337:6807 99 gi|407955691|dbj|AP012495.1| 1359455 0 101M = 1359704 351 CCAATGGCCGCTGCCAGTGGCGTTTTGTCGTGTCTTTCCGGGTTGGACTCAAGTTGATAGTTACCGGA
          TAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTC IIHGDIIIIGIIDIIIIEHIIIIGIIGHIIIIHHHIIIIIHIEIIIHIIIHGHIIIFGHIFIHHHIICIGHHHIIIBCGHGEGHGGDGHBEGGIGEFEAD? NM:i:8 MD:Z:3G4T13A1A0A10A16A0C46 AS:i:62 XS:i:68

          So is this an example of a uniquely mapped read ?

          Comment


          • #6
            Originally posted by Kennels View Post
            The "SA" tag is now used (v0.7.5) to identify reads which have chimeric alignment:
            https://github.com/lh3/bwa/blob/master/NEWS
            hello, I dig up this post because of your comment on the SA: tag
            could you please elaborate, the provided link is broken.
            the main question is:
            are the reads with the SA: code still unique? what about the ones that also incorporate the XA: code?
            Here are some examples
            Code:
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 256     chr2    161581931       0       32M17H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        CAC<E?E<E????DDF@DBEB<D?9??D99D<        NM:i:0  MD:Z:32 AS:i:32 XS:i:20 SA:Z:chr12,87311999,-,32M17S,0,0;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1212:1519:12286 256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        .):639=>?>?8??<)7;7)=8=8)0>8>?.6        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:10843:5627 256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        B?@AB@AABBBBABBB<ABBB?BAB?BB?BBB        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360        256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        B?BBBBBBBB=ABBBBBBB?A?ABB0=A?BBB        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 0       chr11   22585338        0       18S33M  *       0       0       TTTTCCCATTCTGTAGATTGTCTGTTTACTCTGATTATAGTTTATTTTGCT     IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHFGIHIIIIFEIGHEGHCHF     NM:i:0  MD:Z:33 AS:i:33 XS:i:33 SA:Z:chr17,59076122,+,33M18S,0,0;       XA:Z:chr3,+175713108,51M,4;chr2,-116989725,16S35M,1;chr19,+21698327,7S29M15S,0;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1212:1519:12286 16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     ??>?9@?=.:().))?>.)6.?>8>0)8=8=)7;7)<??8?>?>=936:).     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;
                   XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:10843:5627 16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     BBBBBBABBABABABBBBBBBB?BB?BAB?BBBA<BBBABBBBAA@BA@?B     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;
                   XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360        16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     BBBBBBABB>A<B>B>A?7BBB?A=0BBA?A?BBBBBBBA=BBBBBBBB?B     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;       XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 16      chr12   87311999        0       32M17S  *       0       0       TATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA       AA==5.DB8BB8=/??*<D99D??9?D<BEBD@FDD????E<E?E<CAC       NM:i:0  MD:Z:32 AS:i:32 XS:i:30 SA:Z:chr2,161581931,+,32M17S,0,0;       XA:Z:chr9,-104599381,49M,4;chr3,+170653467,49M,4;chr12,+46991828,49M,5;
            BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 256     chr17   59076122        0       33M18H  *       0       0       TTTTCCCATTCTGTAGATTGTCTGTTTACTCTG       IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHF       NM:i:0  MD:Z:33 AS:i:33 XS:i:32 SA:Z:chr11,22585338,+,18S33M,0,0;       XA:Z:chr4,-151801895,19S32M,0;

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              Today, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 07:15 AM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 10:28 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 07:35 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-22-2024, 02:06 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Working...
            X