@yxi
Please use "samtools view -X" to see a human readable FLAG. I agree that not specifying a better FLAG field initially is a shortcoming, but it is too late to change the spec at the moment. samtools view -X comes as a temporary hack which I find useful.
Could you suggest a better format for the aux fields or to make SAM simpler? Note that SAM should be both human readable and machine readable. The current form is the best we can come to so far. Genbank/EMBL files are human readable, but they cause a lot of troubles in parsing, and we do not want to go in that way again. I think the best solution to human readability is not to change the spec, but to write a script to print a SAM alignment in multiple lines in a beautiful way. If you want to contribute to such a script, that would be great. Thanks.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Actually, I always have the question why SAM format using a binary FLAG field instead of text/char field to indicate the mapping status. IMHO a human readable FLAG field would make much more sense, and maybe the strand information could be listed as a separated field. It's also a headache to look at those aux fields. Excuse me but I have the feeling that SAM format is somewhat over complicated, is there any possibility the SAM format could be modified so that people could read it more easily?
Thanks.
Leave a comment:
-
Originally posted by bioinfosm View Postwhat would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.
Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?
Thanks
You are right, MAPQ =0 is an indication that this read has multiple best hits, which means it is not unique match.
Leave a comment:
-
what would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.
Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?
Thanks
Leave a comment:
-
Originally posted by aurelielaugraud View Postok I knew that but it takes quite a long time and disk space to make the file human readable ... doesn't matter, i will try and see ...
thanks
To view that bam file, use samtools view test.bam, then you will be able to see the content. On the other hand, you will also be able to output part of bam to sam file by unix command for your parsing.
Leave a comment:
-
ok I knew that but it takes quite a long time and disk space to make the file human readable ... doesn't matter, i will try and see ...
thanks
Leave a comment:
-
Originally posted by aurelielaugraud View PostAnother thing, is there anyway to know if the output file of maq or bwa is compete (what is for some reason it stops beforehand ) ? i used to count lines for gem output but as bwa and maq have binary compressed format, I don't know if I can check that .
Output of maq is map file, use maq mapview to make it visible and count the lines which can be done by unix command. The output of bwa is sam file which is human readable, you should be able to open it, but in case you generate the bam file, use samtools view to read it, and similarly to get the line number.
Leave a comment:
-
ok thanks for all the replies
I haven't run the mapping as paired-end . I ran only one file (and it took long enough). I thought I could just try it like that ... and I am used of bwa or gem where I can map the two paired-end files separately which doesn't seem to be the case with maq.(or did I miss something ?)
Anyway so if I understand correctly. as it hasn't been mapped as paires the first two bits are 0 which are ok with me but it somehow knows that it is a paired-end sample ( probably the /1 at the end of the id ??) and it tells me that it is the first read of the pair ... even if it hasn't any pair here.
I just thought I would automatically put this bit to 0 as there is no paired-end on the file I gave it.
99 and 147 is what I get with my bwa try but I made the effort of mapping both paired-end files .
Another thing, is there anyway to know if the output file of maq or bwa is compete (what is for some reason it stops beforehand ) ? i used to count lines for gem output but as bwa and maq have binary compressed format, I don't know if I can check that .
thank you
Leave a comment:
-
The flag number is decimal value, I wrote a small tool to convert decimal to Hex and binary string, but it was rarely used. Firstly, you can easily identify if this read is paired (even: SE ; odd: PE), set this at the beginning:
if($FLAG & 0x01)
$pe_flag = 1
else
$se_flag = 1
Then, suppose this is PE library, someone might want to know if one read is the first read in a pair or second read in a pair (indicated in export file of illumina data). You can do this
if ($FLAG & 0x40)
$1st_flag = 1
This is just a simple example, you can extract different info by operating on different bits of FLAG.
Leave a comment:
-
Originally posted by aurelielaugraud View PostEAS89_20:1:22:1316:1797 80 chr1 10003 0 50M * 0 0
Leave a comment:
-
Looking at the SAM examples then 80 would seem to indicate a reverse mapped single end read. I'd check that you definitely ran the search as a paired end search since the output doesn't suggest this. The format says that flags relating to mate pairs only have any meaning when 0x01 is true (ie the read is paired), so the flag for first read in a pair isn't necessarily wrong.
The example in the SAM tools format definintion shows two reads which form a pair with flags 99 and 147.
99 = 01100011
147 = 10010011
99 = Paired, Proper Pair, Mapped, Mate Mapped, Forward, Mate Reverse, First in pair, Not second in pair
147 = Paired, Proper Pair, Mapped, Mate Mapped, Reverse, Mate Forward, Not first in pair, Second in pair
Which all makes perfect sense.
Leave a comment:
-
EAS89_20:1:22:1316:1797 80 chr1 10003 0 50M * 0 0 GCCCTAACCCTAACCCCAACCCTAACCCTAACCCTAACCCTAACCCTAAC );:@684@A?7@/
=@@);195)/(1><A);2=A?2?4AAB<@7>B?2A;B MF:i:0 NM:i:2 UQ:i:16 H0:i:85 H1:i:58
EAS89_20:1:35:882:1661/1 0 chr1 10007 0 50M * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA BBBBB
BBCBBBBBCBBABBBBAA?BBBA?BBBBA?@BAB@:A@@?>:@@A MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:46
EAS89_20:1:97:1232:1179/1 0 chr1 10009 0 50M * 0 0 ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC BAABB
BBA@BBBA6?A?BA?>:>AA5?>5@?;8??@>=6<+35:;2++/2 MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:54
EAS89_20:1:35:821:41/1 0 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC BBBCBBB?CCBA@
BBCBABBBBB??BBBA??ABBB?;ABA@8@9AB=7-> MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:46
EAS89_20:1:16:1399:1760 80 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC ?83;6=<=2866<
;688??===8;<@8@8@@@8@;BBA=@7?@B9@8ACB MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:62
EAS89_20:1:32:145:946 80 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC A@;A7BBB@B=BB
><B<??B5B=ABB;A9BBB?B8BBB?B@ABCAB@BBB MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:59
Leave a comment:
-
It would probably help if you could post a few example lines from your SAM output so we could see the flag in context.
Leave a comment:
-
Now that I think I understand better this flag, I was trying to read the sam file I generated from a maq alignment ...
how can I get a flag of 80 ?
from what I understood :
80 = 64+16
so
1010000
no paired end ... reverse strand ..ok ...
but the most-left 1 suggest that the read is the first in a pair ... so not coherent with no paired-ends ...
maybe I missed something ...
anyone knows ?
thanks !
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
-
by seqadmin
The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.
Avian Conservation
Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...-
Channel: Articles
03-08-2024, 10:41 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:37 PM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:37 PM
|
||
Started by seqadmin, Yesterday, 06:07 PM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:07 PM
|
||
Started by seqadmin, 03-22-2024, 10:03 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
03-22-2024, 10:03 AM
|
||
Started by seqadmin, 03-21-2024, 07:32 AM
|
0 responses
67 views
0 likes
|
Last Post
by seqadmin
03-21-2024, 07:32 AM
|
Leave a comment: