Header Leaderboard Ad

Collapse

bitwise flag in sam format and others

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edge
    replied
    Hi,

    Is it 16 refer to read map at reverse strand?
    Can I know which bitwise flag refer to those read that map in pair?

    Thanks.

    Leave a comment:


  • stonyc
    replied
    Flag = *?

    Hi, I am mapping 454 reads with the default bwa bwasw, and I'm mostly getting good results, but a couple have me stumped...

    I understand that the FLAG of 4, with aligned chromosomal coordinates indicates an alignment possibly spanning across multiple chromosomes.

    But, I'm also getting a few alignments like this:

    GVVGYHW01CSFZ8 * chrIII 7855436 199 139M * 0 0 GATCTGCCAAGCAACAGGCTAAGTCTCGCAATCAAGCCGTCAGAAAGTTCGCAGTCAAGGCTTAAGTGGCTTGTATTCATTGTTATCCATTCATGGACATTGTTCTGGTTCAATTTCAATGAAAATTGTTGCTTATGTT @BEEA<<552275<----/8//66B<<@BIIII[email protected]AA=EABACB?455;IIIIIIIIIIIIIIIIIIIIIIIHDDDI AS:i:139 XS:i:0 XF:i:3 XE:i:4 XN:i:0

    As you can see, the FLAG for this alignment is "*"... is this similar to the FLAG=4 situation described above, or something else entirely?

    Thanks...

    Leave a comment:


  • shruti
    replied
    thank you for the reply.

    In the example of my previous msg the reads have different coordinates and so essentially they are mapped. I had also tried doing a blat of the sequence on the mouse whole genome (mm9) and both of them show multiple hits.

    Just for clarity, below are the best hits for both the reads. the latter read has many hits.
    ---------------------------------------------------------------------------------------------------
    browser details read1_flag99 50 1 50 50 100.0% 9 - 115259995 115260044 50
    browser details read1_flag99 50 1 50 50 100.0% 9 - 95326385 95326434 50
    browser details read1_flag99 50 1 50 50 100.0% X + 166650106 166650155 50
    browser details read1_flag99 50 1 50 50 100.0% 2 + 181747899 181747948 50
    browser details read1_flag99 50 1 50 50 100.0% 2 + 57481930 57481979 50

    browser details read2_flag151 50 1 50 50 100.0% 9 - 95326353 95326402 50
    browser details read2_flag151 50 1 50 50 100.0% 6 - 50997472 50997521 50
    browser details read2_flag151 50 1 50 50 100.0% 6 - 4873828 4873877 50
    browser details read2_flag151 50 1 50 50 100.0% 6 - 4874144 4874193 50



    Also I would expect that if a read is mapped & the mate pair is unmapped then the flag for the read could not be 99, it may be the values 75 / 91 / 139 / 155. but then I might be wrong.

    Just to clarify "mate reverse strand" means that the mate pair is mapped in reverse strand?

    Leave a comment:


  • swbarnes2
    replied
    No, that is the correct interpretation of the flags. 8 is also the flag for "mate is unmapped", which in theory, the first read should have, since it has the 4 flag for being unmapped itself. I'd guess that the 4 flag in the second read is an error, but it's hard to explain. One weirdness that I know bwa has when aligning is it concatenates separate reference sequences, so if a read crosses over two references, it's have a mapping position, and be flagged as unmapped (PICARD yells at you when it sees this), but that wouldn't seem to explain what you have there.

    And no, unmapped paired reads might not have an * in the rname. If a read is unmapped, and its mate is mapped, it's supposed to have the rname and position of its mate. That's so they will sort together. It's not a bug, it's a feature, really. The flag is supposed to show this, maybe the ISIZE field will too.

    Leave a comment:


  • shruti
    replied
    Hi.. I have problem in understanding the flag too. I used bwa sampe for alignment & have used picards http://picard.sourceforge.net/explain-flags.html to decipher the meaning of the flags.

    Below is the flags and its description
    99 -
    read paired
    read mapped in proper pair
    mate reverse strand
    first in pair

    151 -
    read paired
    read mapped in proper pair
    read unmapped
    read reverse strand
    second in pair

    and following is an example to illustrate my doubt.

    HWI-ST220_63:5:1101:6002:72582 99 chrX 166650106 0 50M = 166650248 192 TAGGGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTT [email protected] XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:9X1:i:3 XM:i:0 XO:i:0 XG:i:0 MD:Z:50
    HWI-ST220_63:5:1101:6002:72582 151 chrX 166650248 0 50M = 166650106 -192 AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG BJIJIJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHGHFFFFFCCC XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:670 XM:i:0 XO:i:0 XG:i:0 MD:Z:50

    Both reads are repetitive and shows a reference to which it maps however one read shows the value 99 while its mate shows 151. How is that one is assigned a value which says mapped (as in 99) and the other unmapped (as in 151) unless the meaning as provided in picard is wrong or my understanding..

    Leave a comment:


  • SH1
    replied
    Hello -
    This may be a naive question, but if the bitwise flag is 4 in a .sam file, shouldn't there always be an asterisk in the RNAME column? I'm getting reads that have a 4 in the FLAG column but also a legitimate reference in the RNAME column. If I understand correctly (which I may not), the RNAME column refers to the place that read maps to, yet a 4 in the FLAG column means it's unmapped. ??? Any help on this would be greatly appreciated - I think I'm throwing out aligned reads because of the 4 in the FLAG column and that is suboptimal. Thank you!!!
    SH1

    Leave a comment:


  • maubp
    replied
    Originally posted by northbio View Post
    Thank you, maubp. I follow your explation, so in my opinion,
    0x0008 the mate is unmapped
    0x0020 strand of the mate
    the two bit is only used if the data is in mate pattern, not useful in pair-end data?
    If you have an unpaired read (i.e. singleton read where FLAG bit 0x0001 is not set), then it has no mate (no partner) so yes, 0x0008 and 0x0020 are meaningless and should not be set.

    Leave a comment:


  • northbio
    replied
    Thank you, maubp. I follow your explation, so in my opinion,
    0x0008 the mate is unmapped
    0x0020 strand of the mate
    the two bit is only used if the data is in mate pattern, not useful in pair-end data?

    Leave a comment:


  • maubp
    replied
    In the context of SAM/BAM, and "pair" is two reads from either end of the same fragment of DNA; the "mate" is the partner read in a pair of reads.

    Thus if you are looking at the forward or /1 read, the mate is the reverse or /2 read, and vice versa). The pair is the combination of the forward and reverse reads (or the /1 and /2 reads depending on your naming convention).

    With that in mind, does the FLAG bit field make more sense?

    Leave a comment:


  • northbio
    replied
    hello everyone, the information above help me further understand the flag in the SAM format. But I still have problems in fully understanding the flag, like:
    0x0002 the read is mapped in a proper pair
    0x0004 the query sequence itself is unmapped
    0x0008 the mate is unmapped
    I don't know what is the meaning of "a proper pair" and the difference between "pair" and "mate", could anyone help me explaining them ?
    And another question is I used tophat to deal with my PairEnd Illumina Seq, but the SAM file produced by tophat is like below:
    FC30W3GAAXX:7:53:723:1789#0 73 1 487961 3 62M1849N13M * 0 0 ATCAGCTTCATTCCCTCAACAGTGTTCTTC
    TTCAACGGGCAGCACATGAAGGTCGACTATGGATCTCCAGATCAC 84AB:[email protected]:=A=-9BB?BB>@7>[email protected]=;@B:BABBB?B>@[email protected]@[email protected]>?ABB>[email protected]@[email protected]@B NM:i:7
    XS:A:+ NH:i:2
    FC30W3GAAXX:7:89:981:2025#0 137 1 487982 3 41M1849N34M * 0 0 GTGTTCTTCTTCAACGGGCAGCACATGAAG
    GTCGACTATGGATCTCCAGATCACACCAAGTTTGTGGGAAGCTTC 8:;?6886<=:><6>8<=?>A:7=;[email protected]@:[email protected]@[email protected]@[email protected][email protected]@@@B;[email protected]@[email protected];< NM:i:4
    XS:A:+ NH:i:2
    I want to know whether colomn 7-9(* 0 0) indicate my data were not considered as PE?

    Leave a comment:


  • argyjbao
    replied
    sam FLAG

    hi guys,
    I found the SAM FLAG encoding method is very clever for storing the alignment information. But I also found that the the negative sign for the insert field in the following pair-end example:
    The manual said the negative sign of insert fileld means the mapping position is smaller than the current one. But the fact is the reverse.
    And also, in the following pair-end, the mapping position fileds are equal for the pair-end reads (2005683). But they are not equal just having overlap.

    Any buddy can help me? Thanks in advance?

    GRC076_1_35_8988_3804/GRC076_1_35_8988_3804 pPr1 NT_004350 2005683 255 76M = 2005683 101 TCCGGGTGGGGGCAGGGGCCCTGGAGGGGTCACTCGGCTGCCGTCTGTCACTTGGGTCCAGAGGAGCTTCTGGTGG CCCCCCCCCBBBBCCCCCCCCCCCCCCCB>CCCCCCCCCCCCDDBDCBDACDC>@B>[email protected]=BB>[email protected]?BC
    GRC076_1_35_8988_3804/GRC076_1_35_8988_3804 pP2 NT_004350 2005683 255 76M = 2005683 -101 GTGGCCTCGGGAGCAAGGGTCAGACCCACCAGAAGCTCCTCTGGACCCAAGTGACAGACGGCAGCCGAGTGACCCC [email protected]@?<A<[email protected]

    Originally posted by lh3 View Post
    @yxi

    Please use "samtools view -X" to see a human readable FLAG. I agree that not specifying a better FLAG field initially is a shortcoming, but it is too late to change the spec at the moment. samtools view -X comes as a temporary hack which I find useful.

    Could you suggest a better format for the aux fields or to make SAM simpler? Note that SAM should be both human readable and machine readable. The current form is the best we can come to so far. Genbank/EMBL files are human readable, but they cause a lot of troubles in parsing, and we do not want to go in that way again. I think the best solution to human readability is not to change the spec, but to write a script to print a SAM alignment in multiple lines in a beautiful way. If you want to contribute to such a script, that would be great. Thanks.

    Leave a comment:


  • maubp
    replied
    Originally posted by Nix View Post
    I should say that now I understand bitwise flags, they are a pretty clever trick for compressing a bunch of boolean flags in a binary file. For SAM spec 2 though, they should be removed from the text format.
    I'd go further and say the flag (and other things) will need redoing to cope with more than just paired reads - it will need to cope with N-tuples of reads each separated by an insert of some estimated size (e.g. Strobe Reads from Pacific Biosciences, or what Helicos calls dark fill).

    Leave a comment:


  • shriram
    replied
    Hello new to the group

    Leave a comment:


  • kjngo
    replied
    Thank you Heng

    Leave a comment:


  • lh3
    replied
    Unmapped reads can have strand. This only means the unmapped sequence is given on its reverse strand.

    Leave a comment:

Working...
X