Hi,
Is it 16 refer to read map at reverse strand?
Can I know which bitwise flag refer to those read that map in pair?
Thanks.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Flag = *?
Hi, I am mapping 454 reads with the default bwa bwasw, and I'm mostly getting good results, but a couple have me stumped...
I understand that the FLAG of 4, with aligned chromosomal coordinates indicates an alignment possibly spanning across multiple chromosomes.
But, I'm also getting a few alignments like this:
GVVGYHW01CSFZ8 * chrIII 7855436 199 139M * 0 0 GATCTGCCAAGCAACAGGCTAAGTCTCGCAATCAAGCCGTCAGAAAGTTCGCAGTCAAGGCTTAAGTGGCTTGTATTCATTGTTATCCATTCATGGACATTGTTCTGGTTCAATTTCAATGAAAATTGTTGCTTATGTT @BEEA<<552275<----/8//66B<<@BIIIIHHHIIIIIIIIIIIIHCDCHHEIEEEECCDBIEEEIIII???IIIIIIIIIHHHDDB131ABB@AA=EABACB?455;IIIIIIIIIIIIIIIIIIIIIIIHDDDI AS:i:139 XS:i:0 XF:i:3 XE:i:4 XN:i:0
As you can see, the FLAG for this alignment is "*"... is this similar to the FLAG=4 situation described above, or something else entirely?
Thanks...
Leave a comment:
-
thank you for the reply.
In the example of my previous msg the reads have different coordinates and so essentially they are mapped. I had also tried doing a blat of the sequence on the mouse whole genome (mm9) and both of them show multiple hits.
Just for clarity, below are the best hits for both the reads. the latter read has many hits.
---------------------------------------------------------------------------------------------------
browser details read1_flag99 50 1 50 50 100.0% 9 - 115259995 115260044 50
browser details read1_flag99 50 1 50 50 100.0% 9 - 95326385 95326434 50
browser details read1_flag99 50 1 50 50 100.0% X + 166650106 166650155 50
browser details read1_flag99 50 1 50 50 100.0% 2 + 181747899 181747948 50
browser details read1_flag99 50 1 50 50 100.0% 2 + 57481930 57481979 50
browser details read2_flag151 50 1 50 50 100.0% 9 - 95326353 95326402 50
browser details read2_flag151 50 1 50 50 100.0% 6 - 50997472 50997521 50
browser details read2_flag151 50 1 50 50 100.0% 6 - 4873828 4873877 50
browser details read2_flag151 50 1 50 50 100.0% 6 - 4874144 4874193 50
Also I would expect that if a read is mapped & the mate pair is unmapped then the flag for the read could not be 99, it may be the values 75 / 91 / 139 / 155. but then I might be wrong.
Just to clarify "mate reverse strand" means that the mate pair is mapped in reverse strand?
Leave a comment:
-
No, that is the correct interpretation of the flags. 8 is also the flag for "mate is unmapped", which in theory, the first read should have, since it has the 4 flag for being unmapped itself. I'd guess that the 4 flag in the second read is an error, but it's hard to explain. One weirdness that I know bwa has when aligning is it concatenates separate reference sequences, so if a read crosses over two references, it's have a mapping position, and be flagged as unmapped (PICARD yells at you when it sees this), but that wouldn't seem to explain what you have there.
And no, unmapped paired reads might not have an * in the rname. If a read is unmapped, and its mate is mapped, it's supposed to have the rname and position of its mate. That's so they will sort together. It's not a bug, it's a feature, really. The flag is supposed to show this, maybe the ISIZE field will too.
Leave a comment:
-
Hi.. I have problem in understanding the flag too. I used bwa sampe for alignment & have used picards http://picard.sourceforge.net/explain-flags.html to decipher the meaning of the flags.
Below is the flags and its description
99 -
read paired
read mapped in proper pair
mate reverse strand
first in pair
151 -
read paired
read mapped in proper pair
read unmapped
read reverse strand
second in pair
and following is an example to illustrate my doubt.
HWI-ST220_63:5:1101:6002:72582 99 chrX 166650106 0 50M = 166650248 192 TAGGGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTT CB@FFDEFHHHFHGIJJGHGJJJJGHCHJIFGGHIGGGGHIJFHGHIIGI XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:9X1:i:3 XM:i:0 XO:i:0 XG:i:0 MD:Z:50
HWI-ST220_63:5:1101:6002:72582 151 chrX 166650248 0 50M = 166650106 -192 AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG BJIJIJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHGHFFFFFCCC XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:670 XM:i:0 XO:i:0 XG:i:0 MD:Z:50
Both reads are repetitive and shows a reference to which it maps however one read shows the value 99 while its mate shows 151. How is that one is assigned a value which says mapped (as in 99) and the other unmapped (as in 151) unless the meaning as provided in picard is wrong or my understanding..
Leave a comment:
-
Hello -
This may be a naive question, but if the bitwise flag is 4 in a .sam file, shouldn't there always be an asterisk in the RNAME column? I'm getting reads that have a 4 in the FLAG column but also a legitimate reference in the RNAME column. If I understand correctly (which I may not), the RNAME column refers to the place that read maps to, yet a 4 in the FLAG column means it's unmapped. ??? Any help on this would be greatly appreciated - I think I'm throwing out aligned reads because of the 4 in the FLAG column and that is suboptimal. Thank you!!!
SH1
Leave a comment:
-
Originally posted by northbio View PostThank you, maubp. I follow your explation, so in my opinion,
0x0008 the mate is unmapped
0x0020 strand of the mate
the two bit is only used if the data is in mate pattern, not useful in pair-end data?
Leave a comment:
-
Thank you, maubp. I follow your explation, so in my opinion,
0x0008 the mate is unmapped
0x0020 strand of the mate
the two bit is only used if the data is in mate pattern, not useful in pair-end data?
Leave a comment:
-
In the context of SAM/BAM, and "pair" is two reads from either end of the same fragment of DNA; the "mate" is the partner read in a pair of reads.
Thus if you are looking at the forward or /1 read, the mate is the reverse or /2 read, and vice versa). The pair is the combination of the forward and reverse reads (or the /1 and /2 reads depending on your naming convention).
With that in mind, does the FLAG bit field make more sense?
Leave a comment:
-
hello everyone, the information above help me further understand the flag in the SAM format. But I still have problems in fully understanding the flag, like:
0x0002 the read is mapped in a proper pair
0x0004 the query sequence itself is unmapped
0x0008 the mate is unmapped
I don't know what is the meaning of "a proper pair" and the difference between "pair" and "mate", could anyone help me explaining them ?
And another question is I used tophat to deal with my PairEnd Illumina Seq, but the SAM file produced by tophat is like below:
FC30W3GAAXX:7:53:723:1789#0 73 1 487961 3 62M1849N13M * 0 0 ATCAGCTTCATTCCCTCAACAGTGTTCTTC
TTCAACGGGCAGCACATGAAGGTCGACTATGGATCTCCAGATCAC 84AB:B@:=A=-9BB?BB>@7>A@ABBBBB=;@B:BABBB?B>@AC@@AAB=CCA6@>?ABB>9@@ACCCA@C@B NM:i:7
XS:A:+ NH:i:2
FC30W3GAAXX:7:89:981:2025#0 137 1 487982 3 41M1849N34M * 0 0 GTGTTCTTCTTCAACGGGCAGCACATGAAG
GTCGACTATGGATCTCCAGATCACACCAAGTTTGTGGGAAGCTTC 8:;?6886<=:><6>8<=?>A:7=;8A?@@:BAA=A@@A@A@AAB@?@@@@B;B@AABA@BBABCCCBB@CBA;< NM:i:4
XS:A:+ NH:i:2
I want to know whether colomn 7-9(* 0 0) indicate my data were not considered as PE?
Leave a comment:
-
sam FLAG
hi guys,
I found the SAM FLAG encoding method is very clever for storing the alignment information. But I also found that the the negative sign for the insert field in the following pair-end example:
The manual said the negative sign of insert fileld means the mapping position is smaller than the current one. But the fact is the reverse.
And also, in the following pair-end, the mapping position fileds are equal for the pair-end reads (2005683). But they are not equal just having overlap.
Any buddy can help me? Thanks in advance?
GRC076_1_35_8988_3804/GRC076_1_35_8988_3804 pPr1 NT_004350 2005683 255 76M = 2005683 101 TCCGGGTGGGGGCAGGGGCCCTGGAGGGGTCACTCGGCTGCCGTCTGTCACTTGGGTCCAGAGGAGCTTCTGGTGG CCCCCCCCCBBBBCCCCCCCCCCCCCCCB>CCCCCCCCCCCCDDBDCBDACDC>@B>BBBBBB@=BB>ABBB@?BC
GRC076_1_35_8988_3804/GRC076_1_35_8988_3804 pP2 NT_004350 2005683 255 76M = 2005683 -101 GTGGCCTCGGGAGCAAGGGTCAGACCCACCAGAAGCTCCTCTGGACCCAAGTGACAGACGGCAGCCGAGTGACCCC BCD@BB7@?<A<=AABBBCBBCCCCCCCCCCCC@CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
Originally posted by lh3 View Post@yxi
Please use "samtools view -X" to see a human readable FLAG. I agree that not specifying a better FLAG field initially is a shortcoming, but it is too late to change the spec at the moment. samtools view -X comes as a temporary hack which I find useful.
Could you suggest a better format for the aux fields or to make SAM simpler? Note that SAM should be both human readable and machine readable. The current form is the best we can come to so far. Genbank/EMBL files are human readable, but they cause a lot of troubles in parsing, and we do not want to go in that way again. I think the best solution to human readability is not to change the spec, but to write a script to print a SAM alignment in multiple lines in a beautiful way. If you want to contribute to such a script, that would be great. Thanks.
Leave a comment:
-
Originally posted by Nix View PostI should say that now I understand bitwise flags, they are a pretty clever trick for compressing a bunch of boolean flags in a binary file. For SAM spec 2 though, they should be removed from the text format.
Leave a comment:
-
Unmapped reads can have strand. This only means the unmapped sequence is given on its reverse strand.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
104 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
112 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
1 response
116 views
0 likes
|
Last Post
by EmiTom
10-07-2024, 06:46 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: