Header Leaderboard Ad

Collapse

bitwise flag in sam format and others

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    as I understand the flag meaning for 69, the query is unmapped, but the mate is mapped
    This seems to be an unfixed bug.

    Moreover, I get some lines with MAPQ to 0 and flag to 99/147 with coherent insert size
    I am not sure why you think this is problematic.

    how can one pair have a MAPQ at 37 and the other at 0
    I have answered this question several times in the mailing list. I now put it in FAQ on the bwa homepage.

    Comment


    • #32
      thanks a lot for the replies for the replies

      Comment


      • #33
        would you please explain what xt:a:n and xt:a:M means ?
        (I haven't found any xt:a:n yet but some xt:a:m ...
        I understand the xt:a:r and xt:a:u

        Comment


        • #34
          Originally posted by bioinfosm View Post
          what would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.

          Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?

          Thanks
          I have a similar question. All I see are 0, 4, 16, and 20s. I do not understand how to interpret 20. I know the hexadecimal would be 14, which should mean this is a combination of both strand and not mapped. Please correct me if i'm wrong. I see a MAPQ score and also reference hit, so does it still mean it mapped? Thanks

          Comment


          • #35
            Unmapped reads can have strand. This only means the unmapped sequence is given on its reverse strand.

            Comment


            • #36
              Thank you Heng

              Comment


              • #37
                Hello new to the group

                Comment


                • #38
                  Originally posted by Nix View Post
                  I should say that now I understand bitwise flags, they are a pretty clever trick for compressing a bunch of boolean flags in a binary file. For SAM spec 2 though, they should be removed from the text format.
                  I'd go further and say the flag (and other things) will need redoing to cope with more than just paired reads - it will need to cope with N-tuples of reads each separated by an insert of some estimated size (e.g. Strobe Reads from Pacific Biosciences, or what Helicos calls dark fill).

                  Comment


                  • #39
                    sam FLAG

                    hi guys,
                    I found the SAM FLAG encoding method is very clever for storing the alignment information. But I also found that the the negative sign for the insert field in the following pair-end example:
                    The manual said the negative sign of insert fileld means the mapping position is smaller than the current one. But the fact is the reverse.
                    And also, in the following pair-end, the mapping position fileds are equal for the pair-end reads (2005683). But they are not equal just having overlap.

                    Any buddy can help me? Thanks in advance?

                    GRC076_1_35_8988_3804/GRC076_1_35_8988_3804 pPr1 NT_004350 2005683 255 76M = 2005683 101 TCCGGGTGGGGGCAGGGGCCCTGGAGGGGTCACTCGGCTGCCGTCTGTCACTTGGGTCCAGAGGAGCTTCTGGTGG CCCCCCCCCBBBBCCCCCCCCCCCCCCCB>CCCCCCCCCCCCDDBDCBDACDC>@B>[email protected]=BB>[email protected]?BC
                    GRC076_1_35_8988_3804/GRC076_1_35_8988_3804 pP2 NT_004350 2005683 255 76M = 2005683 -101 GTGGCCTCGGGAGCAAGGGTCAGACCCACCAGAAGCTCCTCTGGACCCAAGTGACAGACGGCAGCCGAGTGACCCC [email protected]@?<A<[email protected]

                    Originally posted by lh3 View Post
                    @yxi

                    Please use "samtools view -X" to see a human readable FLAG. I agree that not specifying a better FLAG field initially is a shortcoming, but it is too late to change the spec at the moment. samtools view -X comes as a temporary hack which I find useful.

                    Could you suggest a better format for the aux fields or to make SAM simpler? Note that SAM should be both human readable and machine readable. The current form is the best we can come to so far. Genbank/EMBL files are human readable, but they cause a lot of troubles in parsing, and we do not want to go in that way again. I think the best solution to human readability is not to change the spec, but to write a script to print a SAM alignment in multiple lines in a beautiful way. If you want to contribute to such a script, that would be great. Thanks.

                    Comment


                    • #40
                      hello everyone, the information above help me further understand the flag in the SAM format. But I still have problems in fully understanding the flag, like:
                      0x0002 the read is mapped in a proper pair
                      0x0004 the query sequence itself is unmapped
                      0x0008 the mate is unmapped
                      I don't know what is the meaning of "a proper pair" and the difference between "pair" and "mate", could anyone help me explaining them ?
                      And another question is I used tophat to deal with my PairEnd Illumina Seq, but the SAM file produced by tophat is like below:
                      FC30W3GAAXX:7:53:723:1789#0 73 1 487961 3 62M1849N13M * 0 0 ATCAGCTTCATTCCCTCAACAGTGTTCTTC
                      TTCAACGGGCAGCACATGAAGGTCGACTATGGATCTCCAGATCAC 84AB:[email protected]:=A=-9BB?BB>@7>[email protected]=;@B:BABBB?B>@[email protected]@[email protected]>?ABB>[email protected]@[email protected]@B NM:i:7
                      XS:A:+ NH:i:2
                      FC30W3GAAXX:7:89:981:2025#0 137 1 487982 3 41M1849N34M * 0 0 GTGTTCTTCTTCAACGGGCAGCACATGAAG
                      GTCGACTATGGATCTCCAGATCACACCAAGTTTGTGGGAAGCTTC 8:;?6886<=:><6>8<=?>A:7=;[email protected]@:[email protected]@[email protected]@[email protected][email protected]@@@B;[email protected]@[email protected];< NM:i:4
                      XS:A:+ NH:i:2
                      I want to know whether colomn 7-9(* 0 0) indicate my data were not considered as PE?

                      Comment


                      • #41
                        In the context of SAM/BAM, and "pair" is two reads from either end of the same fragment of DNA; the "mate" is the partner read in a pair of reads.

                        Thus if you are looking at the forward or /1 read, the mate is the reverse or /2 read, and vice versa). The pair is the combination of the forward and reverse reads (or the /1 and /2 reads depending on your naming convention).

                        With that in mind, does the FLAG bit field make more sense?

                        Comment


                        • #42
                          Thank you, maubp. I follow your explation, so in my opinion,
                          0x0008 the mate is unmapped
                          0x0020 strand of the mate
                          the two bit is only used if the data is in mate pattern, not useful in pair-end data?

                          Comment


                          • #43
                            Originally posted by northbio View Post
                            Thank you, maubp. I follow your explation, so in my opinion,
                            0x0008 the mate is unmapped
                            0x0020 strand of the mate
                            the two bit is only used if the data is in mate pattern, not useful in pair-end data?
                            If you have an unpaired read (i.e. singleton read where FLAG bit 0x0001 is not set), then it has no mate (no partner) so yes, 0x0008 and 0x0020 are meaningless and should not be set.

                            Comment


                            • #44
                              Hello -
                              This may be a naive question, but if the bitwise flag is 4 in a .sam file, shouldn't there always be an asterisk in the RNAME column? I'm getting reads that have a 4 in the FLAG column but also a legitimate reference in the RNAME column. If I understand correctly (which I may not), the RNAME column refers to the place that read maps to, yet a 4 in the FLAG column means it's unmapped. ??? Any help on this would be greatly appreciated - I think I'm throwing out aligned reads because of the 4 in the FLAG column and that is suboptimal. Thank you!!!
                              SH1

                              Comment


                              • #45
                                Hi.. I have problem in understanding the flag too. I used bwa sampe for alignment & have used picards http://picard.sourceforge.net/explain-flags.html to decipher the meaning of the flags.

                                Below is the flags and its description
                                99 -
                                read paired
                                read mapped in proper pair
                                mate reverse strand
                                first in pair

                                151 -
                                read paired
                                read mapped in proper pair
                                read unmapped
                                read reverse strand
                                second in pair

                                and following is an example to illustrate my doubt.

                                HWI-ST220_63:5:1101:6002:72582 99 chrX 166650106 0 50M = 166650248 192 TAGGGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTT [email protected] XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:9X1:i:3 XM:i:0 XO:i:0 XG:i:0 MD:Z:50
                                HWI-ST220_63:5:1101:6002:72582 151 chrX 166650248 0 50M = 166650106 -192 AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG BJIJIJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHGHFFFFFCCC XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:670 XM:i:0 XO:i:0 XG:i:0 MD:Z:50

                                Both reads are repetitive and shows a reference to which it maps however one read shows the value 99 while its mate shows 151. How is that one is assigned a value which says mapped (as in 99) and the other unmapped (as in 151) unless the meaning as provided in picard is wrong or my understanding..

                                Comment

                                Working...
                                X