Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • lh3
    replied
    @yxi

    Please use "samtools view -X" to see a human readable FLAG. I agree that not specifying a better FLAG field initially is a shortcoming, but it is too late to change the spec at the moment. samtools view -X comes as a temporary hack which I find useful.

    Could you suggest a better format for the aux fields or to make SAM simpler? Note that SAM should be both human readable and machine readable. The current form is the best we can come to so far. Genbank/EMBL files are human readable, but they cause a lot of troubles in parsing, and we do not want to go in that way again. I think the best solution to human readability is not to change the spec, but to write a script to print a SAM alignment in multiple lines in a beautiful way. If you want to contribute to such a script, that would be great. Thanks.

    Leave a comment:


  • yxi
    replied
    Actually, I always have the question why SAM format using a binary FLAG field instead of text/char field to indicate the mapping status. IMHO a human readable FLAG field would make much more sense, and maybe the strand information could be listed as a separated field. It's also a headache to look at those aux fields. Excuse me but I have the feeling that SAM format is somewhat over complicated, is there any possibility the SAM format could be modified so that people could read it more easily?

    Thanks.

    Leave a comment:


  • totalnew
    replied
    Originally posted by bioinfosm View Post
    what would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.

    Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?

    Thanks
    Flag 0 means "the read is not paired and mapped, forward strand"

    You are right, MAPQ =0 is an indication that this read has multiple best hits, which means it is not unique match.

    Leave a comment:


  • bioinfosm
    replied
    what would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.

    Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?

    Thanks

    Leave a comment:


  • totalnew
    replied
    Originally posted by aurelielaugraud View Post
    ok I knew that but it takes quite a long time and disk space to make the file human readable ... doesn't matter, i will try and see ...
    thanks
    You are right, sam file is quite large size, from my simulation, the sam file is generally four times larger than corresponding bam file. My idea is to write a script to generate sam file and further the bam file, and in that script delete the sam file right after the bam file generated.

    To view that bam file, use samtools view test.bam, then you will be able to see the content. On the other hand, you will also be able to output part of bam to sam file by unix command for your parsing.

    Leave a comment:


  • aurelielaugraud
    replied
    ok I knew that but it takes quite a long time and disk space to make the file human readable ... doesn't matter, i will try and see ...
    thanks

    Leave a comment:


  • totalnew
    replied
    Originally posted by aurelielaugraud View Post
    Another thing, is there anyway to know if the output file of maq or bwa is compete (what is for some reason it stops beforehand ) ? i used to count lines for gem output but as bwa and maq have binary compressed format, I don't know if I can check that .

    Output of maq is map file, use maq mapview to make it visible and count the lines which can be done by unix command. The output of bwa is sam file which is human readable, you should be able to open it, but in case you generate the bam file, use samtools view to read it, and similarly to get the line number.

    Leave a comment:


  • aurelielaugraud
    replied
    ok thanks for all the replies
    I haven't run the mapping as paired-end . I ran only one file (and it took long enough). I thought I could just try it like that ... and I am used of bwa or gem where I can map the two paired-end files separately which doesn't seem to be the case with maq.(or did I miss something ?)
    Anyway so if I understand correctly. as it hasn't been mapped as paires the first two bits are 0 which are ok with me but it somehow knows that it is a paired-end sample ( probably the /1 at the end of the id ??) and it tells me that it is the first read of the pair ... even if it hasn't any pair here.
    I just thought I would automatically put this bit to 0 as there is no paired-end on the file I gave it.
    99 and 147 is what I get with my bwa try but I made the effort of mapping both paired-end files .

    Another thing, is there anyway to know if the output file of maq or bwa is compete (what is for some reason it stops beforehand ) ? i used to count lines for gem output but as bwa and maq have binary compressed format, I don't know if I can check that .

    thank you

    Leave a comment:


  • totalnew
    replied
    The flag number is decimal value, I wrote a small tool to convert decimal to Hex and binary string, but it was rarely used. Firstly, you can easily identify if this read is paired (even: SE ; odd: PE), set this at the beginning:
    if($FLAG & 0x01)
    $pe_flag = 1
    else
    $se_flag = 1

    Then, suppose this is PE library, someone might want to know if one read is the first read in a pair or second read in a pair (indicated in export file of illumina data). You can do this

    if ($FLAG & 0x40)
    $1st_flag = 1

    This is just a simple example, you can extract different info by operating on different bits of FLAG.

    Leave a comment:


  • simonandrews
    replied
    Originally posted by aurelielaugraud View Post
    EAS89_20:1:22:1316:1797 80 chr1 10003 0 50M * 0 0
    Your mate pair name is * and the mate position and insert size are both 0 indicating you have not done a paired end mapping (which agrees with your flag reading).

    Leave a comment:


  • simonandrews
    replied
    Looking at the SAM examples then 80 would seem to indicate a reverse mapped single end read. I'd check that you definitely ran the search as a paired end search since the output doesn't suggest this. The format says that flags relating to mate pairs only have any meaning when 0x01 is true (ie the read is paired), so the flag for first read in a pair isn't necessarily wrong.

    The example in the SAM tools format definintion shows two reads which form a pair with flags 99 and 147.

    99 = 01100011
    147 = 10010011

    99 = Paired, Proper Pair, Mapped, Mate Mapped, Forward, Mate Reverse, First in pair, Not second in pair

    147 = Paired, Proper Pair, Mapped, Mate Mapped, Reverse, Mate Forward, Not first in pair, Second in pair

    Which all makes perfect sense.

    Leave a comment:


  • aurelielaugraud
    replied
    EAS89_20:1:22:1316:1797 80 chr1 10003 0 50M * 0 0 GCCCTAACCCTAACCCCAACCCTAACCCTAACCCTAACCCTAACCCTAAC );:@684@A?7@/
    =@@);195)/(1><A);2=A?2?4AAB<@7>B?2A;B MF:i:0 NM:i:2 UQ:i:16 H0:i:85 H1:i:58
    EAS89_20:1:35:882:1661/1 0 chr1 10007 0 50M * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA BBBBB
    BBCBBBBBCBBABBBBAA?BBBA?BBBBA?@BAB@:A@@?>:@@A MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:46
    EAS89_20:1:97:1232:1179/1 0 chr1 10009 0 50M * 0 0 ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC BAABB
    BBA@BBBA6?A?BA?>:>AA5?>5@?;8??@>=6<+35:;2++/2 MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:54
    EAS89_20:1:35:821:41/1 0 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC BBBCBBB?CCBA@
    BBCBABBBBB??BBBA??ABBB?;ABA@8@9AB=7-> MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:46
    EAS89_20:1:16:1399:1760 80 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC ?83;6=<=2866<
    ;688??===8;<@8@8@@@8@;BBA=@7?@B9@8ACB MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:62
    EAS89_20:1:32:145:946 80 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC A@;A7BBB@B=BB
    ><B<??B5B=ABB;A9BBB?B8BBB?B@ABCAB@BBB MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:59

    Leave a comment:


  • simonandrews
    replied
    It would probably help if you could post a few example lines from your SAM output so we could see the flag in context.

    Leave a comment:


  • aurelielaugraud
    replied
    Now that I think I understand better this flag, I was trying to read the sam file I generated from a maq alignment ...
    how can I get a flag of 80 ?
    from what I understood :
    80 = 64+16
    so
    1010000
    no paired end ... reverse strand ..ok ...
    but the most-left 1 suggest that the read is the first in a pair ... so not coherent with no paired-ends ...
    maybe I missed something ...
    anyone knows ?
    thanks !

    Leave a comment:


  • aurelielaugraud
    replied
    thank you very much, it does help a lot !!

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
67 views
0 likes
Last Post seqadmin  
Working...
X