Header Leaderboard Ad

Collapse

bitwise flag in SAM file

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bitwise flag in SAM file

    Hi Everyone,
    Recently I have aligned my sequencing results using BWA and get the aligned result in SAM format.
    My sequence is a single end read and therefore I would expect the bitwise flag (second field) in the SAM format contains only 0, 4, 16 (correct me if i m wrong).

    However, I notice one aligned result with strange value:
    id1 20 chrX 154913749 25 36M * 0 0 TGCGGTCCAACCCTAACCCTAACCCTAACCCTAACC SVVVVV[[[[[[[[[[Z[Z[[[[ZZ[[[[[Z[[[[Z XT:A:U NM:i:2X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:2T4T28

    I cannot explain why there is a bitwise flag 20 in the second field. I expect it to come from 4+16, however, since it is mapped in reverse strand (give 16), then why there is 4 (which means unmapped) contribute to the bitwise flag?

    Thank you very much for the help.

  • #2
    Originally posted by win804 View Post
    Hi Everyone,
    Recently I have aligned my sequencing results using BWA and get the aligned result in SAM format.
    My sequence is a single end read and therefore I would expect the bitwise flag (second field) in the SAM format contains only 0, 4, 16 (correct me if i m wrong).

    However, I notice one aligned result with strange value:
    id1 20 chrX 154913749 25 36M * 0 0 TGCGGTCCAACCCTAACCCTAACCCTAACCCTAACC SVVVVV[[[[[[[[[[Z[Z[[[[ZZ[[[[[Z[[[[Z XT:A:U NM:i:2X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:2T4T28

    I cannot explain why there is a bitwise flag 20 in the second field. I expect it to come from 4+16, however, since it is mapped in reverse strand (give 16), then why there is 4 (which means unmapped) contribute to the bitwise flag?

    Thank you very much for the help.
    Probably a bug in BWA. Why is there a chr/position and non-zero mapping quality for this read? Also, I "blatted" the base sequence and it did not map to the X chromosome. Anyone else?

    Comment


    • #3
      I guess it's a bug also. I modified the sequence according to MD:Z:2T4T28 and blatted against hg18 and I got this hit:
      QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
      id1 35 1 36 36 100.0% 2 + 181848646 181848793 148

      Thanks Nilshomer.
      I wonder whether anybody has encountered this kind of problem before.

      Comment


      • #4
        Let see what Heng saids. If you can, share the reference and the read that is causing the issue so the bug can be reproduced. Actually, do you get the same results when aligning that single read against your genome?
        -drd

        Comment


        • #5
          Nothing is wrong. Flag 16 only means the read sequence is on the reverse strand. It may or may not be mapped at all. You see coordinate because it stands out of chrX. The BLAT result is not informative. Note the "148"bp span. Maybe you are mapping RNA-seq reads, but bwa is unable to map reads across exon juctions.

          Comment


          • #6
            Hi Heng,

            Would you please help to clarify one point that if a read mapped to the junction of two chromosomes, I understand that its Flag will be 4 and I can still see the position, CIGAR and all other tags, but would the MAPQ be set at zero?

            I saw some chip-seq single end alignments from BWA in my data with Flag=20. I took it as unmapped, but the MAPQ is not equal zero. Is this legal in terms of sam format?

            Thanks in advance!

            Yuan

            Comment


            • #7
              My data is a ChIP-seq data.
              To test again, I have extracted the single read and aligned towards hg18.
              The result becomes:
              id1 4 chrX 154913749 25 36M * 0 0 TGCGGTCCAACCCTAACCCTAACCCTAACCCTAACC SVVVVV[[[[[[[[[[Z[Z[[[[ZZ[[[[[Z[[[[Z XT:A:U NM:i:2 X0:i:1 X1:i:0 XM:i:2 XO:i:0
              XG:i:0 MD:Z:2T4T28

              So, this time it tells me that it is unmapped (with bitwise flag = 4). But strangely it still contains the coordinate. I noticed that the length of chrX is @SQ SN:chrX LN:154913754, which means the mapping coordinate may be wrong? Since the POS=154913749 and if I add the length of my sequence, it is more than chrX length.

              Comment


              • #8
                If a read mapped to the junction of two chromosomes, it is shown as unmapped, but you can still see all the tags. Your read may mapped to the end of chrX and the beginning the following chromosome.

                Yuan

                Comment


                • #9
                  Hi Yuan,
                  Thanks for your explanation. Now I understand already.

                  Comment


                  • #10
                    Code:
                    GA-C_0001:1:1:4471:3450#0 117 hg19_refGene_NR_033322 1558 0 * = 1558 0 NCCAATGTGCCCTGTGTGGTCCCGTTACGTGTCCCATCATGTTTTGTGC B````ba`bb`bbbbbbbbbbbb`bbbbbbbabbbbbbbbbbbbbbbbb  
                    GA-C_0001:1:1:4471:3450#0 157 hg19_refGene_NR_033322 1558 37 49M = 1558 0 CGCTGAGCATGATGGGGCATGTGCGGGAGCGCCAGGCGGGGCATGTAAC bbaaa^b_abcbbbbObbbbcbbbbbbbbbbbbaabbbbbbbbbbbbbb XT:A:U NM:i:1 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:1T47
                    Two lines for two paired-end reads' mapping results are shown here. It was done by BWA. Please note the flags of the two reads: 117 and 157. The third and the fourth bits (from right) of 117 and 157 are (0,1) and (1,1), respectively. The third bit indicates whether the current read is unmapped and the fourth bit indicates whether the mate is unmapped. So it is clear that the third and fourth bits (0,1) and (1,1) for the mate reads are conflicted with each other. Is this a bug of BWA? Any suggestions? Thanks in advance.
                    Xi Wang

                    Comment


                    • #11
                      I am not sure if this is a bug, but it does not look like proper SAM format?
                      The 157 flag -> mate on positive, query on negative
                      The 117 flag -> mate on negative, query on negative
                      Not sure if this helps but i was having trouble parsing SAM output and checking it a while back so i made a small tool to help out with making the SAM output a bit more verbose. If you want i posted the source and binaries to,
                      http://misko.ca/sam2pretty.html
                      Hope this helps?

                      Misko

                      Comment


                      • #12
                        A nice tool. But how did you think it's not in a proper SAM format? It is the output of BWA.
                        Xi Wang

                        Comment


                        • #13
                          Sometimes bwa flags read1 as unmapped, but for read2, the mate-unmapped flag is not set. This is a bug.

                          Comment


                          • #14
                            Originally posted by lh3 View Post
                            Sometimes bwa flags read1 as unmapped, but for read2, the mate-unmapped flag is not set. This is a bug.
                            Thanks, Heng. Hope next release will solve this:-)
                            Xi Wang

                            Comment


                            • #15
                              Questions about BWA

                              Originally posted by lh3 View Post
                              Sometimes bwa flags read1 as unmapped, but for read2, the mate-unmapped flag is not set. This is a bug.
                              BWA is quite a good software, but I met some problems when using.
                              My first question is: as Doc. Heng Li said, “Sometimes BWA flags read1 as unmapped, but for read2, the mate-unmapped flag is not set”, flag may make mistakes. So which field should be checked to see if the read has been mapped? Is that only “NM” or something else?
                              The second question is: when there are several best hits (X0: i: n, n>1), why are annotation fields given for only one best hit, but not for the rest hits in the “XA” record? How do you choose the fully annotated hit? And why are hits in the “XA” sometimes the same as that hit?
                              Waiting for the answers eagerly

                              Comment

                              Working...
                              X