Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bitwise flag in sam format and others

    Hello,
    new to the seqanswers community, I have started with NGS data (illumina) about 2 month ago.
    We align with GEM (not published yet but quite good as far as we tried it), we tried Maq but is doesn't seem to work for the moment on our machines (says it is running but not writting anything ....), and I am currently running bwa at the moment as another test.
    We want to use SAM format but we don't get how to use th flag field, is anyone having an exemple for us please ? I know bwa outputs can be converted directly but we would need to do it ourselves for the GEM output.

    moreover, there is a question on the MAPQ field. we don't have any direct information about that I think. Can we infer it ? how is it calculated ? we have phred-based quality scores for each base.

    thanks to the community.

  • #2
    FLAG field is normally associated with other attributes to parse the read, likely, if this read is mapped, if this red is paired..... see the description table in samtools manual.

    For MAPQ calculation, it is complicated. The mapping quality is the phred-scaled probability that a read alignment may be wrong. In practice, MAPQ is approximated in some ways. bwa approximates the MAPQ as 0, 23, 25 37, 255,...according to the factors of number of best hits, number of second best hits.......

    Comment


    • #3
      Hello and thanks for taking some time to reply .
      I have already had a look at samtools manual regarding the flag section but I am sorry it doesn't seem to be enough for me and I would really appreciate an example. or some documentation on how to use bitwise data ...
      Maybe I can leave the MAPQ field with 255 value as indicated in the manual is the program I use to map the reads doesn't caclulate this value for me. ( the more it goes, the least I wish to calculate it myself)

      Comment


      • #4
        Originally posted by aurelielaugraud View Post
        We want to use SAM format but we don't get how to use the flag field, is anyone having an exemple for us please?
        The SAM flag field although it appears as a single number actually contains several pieces of information which have been combined together. It is a bitwise field, which means that it makes use of the way that computers represent numbers to store several small values stored in one large value.

        If you think of a standard integer as being composed of 32 bits (0 or 1) then it would look like:

        00000000000000000000000000000000

        However SAM uses this single number as a series of boolean (true false) flags where each position in the array of bits represents a different sequence attribute

        Bit 0 = The read was part of a pair during sequencing
        Bit 1 = The read is mapped in a pair
        Bit 2 = The query sequence is unmapped
        Bit 3 = The mate is unmapped
        Bit 4 = Strand of query (0=forward 1=reverse)

        etc.

        Constructing the value from the individual flags is fairly easy. If the flag is false don't add anything to the total. If its true then add 2 raised to the power of the bit position.

        For example:

        Bit 0 - false - add nothing
        Bit 1 - true - add 2**1 = 2
        Bit 2 - false - add nothing
        Bit 3 - true - add 2**3 = 8
        Bit 4 - true - add 2**4 = 16

        Bit pattern = 11010 = 16+8+2 = 26

        So the flag value would be 26.

        To extract the individual flags from the compound value you can use a logical AND operation. This will tell you if a specific bit in the compound value is true or not. The exact syntax will depend on the language you're using, but in Perl for instance you could do:

        if ($compound & 16) {
        print "Reverse";
        }
        else {
        print "Forward";
        }

        To extract the information from the 4th (therefore 2**4 = 16) bit field.

        I hope that makes it a bit clearer.

        Comment


        • #5
          thank you very much, it does help a lot !!

          Comment


          • #6
            Now that I think I understand better this flag, I was trying to read the sam file I generated from a maq alignment ...
            how can I get a flag of 80 ?
            from what I understood :
            80 = 64+16
            so
            1010000
            no paired end ... reverse strand ..ok ...
            but the most-left 1 suggest that the read is the first in a pair ... so not coherent with no paired-ends ...
            maybe I missed something ...
            anyone knows ?
            thanks !

            Comment


            • #7
              It would probably help if you could post a few example lines from your SAM output so we could see the flag in context.

              Comment


              • #8
                EAS89_20:1:22:1316:1797 80 chr1 10003 0 50M * 0 0 GCCCTAACCCTAACCCCAACCCTAACCCTAACCCTAACCCTAACCCTAAC );:@684@A?7@/
                =@@);195)/(1><A);2=A?2?4AAB<@7>B?2A;B MF:i:0 NM:i:2 UQ:i:16 H0:i:85 H1:i:58
                EAS89_20:1:35:882:1661/1 0 chr1 10007 0 50M * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA BBBBB
                BBCBBBBBCBBABBBBAA?BBBA?BBBBA?@BAB@:A@@?>:@@A MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:46
                EAS89_20:1:97:1232:1179/1 0 chr1 10009 0 50M * 0 0 ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC BAABB
                BBA@BBBA6?A?BA?>:>AA5?>5@?;8??@>=6<+35:;2++/2 MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:54
                EAS89_20:1:35:821:41/1 0 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC BBBCBBB?CCBA@
                BBCBABBBBB??BBBA??ABBB?;ABA@8@9AB=7-> MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:46
                EAS89_20:1:16:1399:1760 80 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC ?83;6=<=2866<
                ;688??===8;<@8@8@@@8@;BBA=@7?@B9@8ACB MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:62
                EAS89_20:1:32:145:946 80 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC A@;A7BBB@B=BB
                ><B<??B5B=ABB;A9BBB?B8BBB?B@ABCAB@BBB MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:59

                Comment


                • #9
                  Looking at the SAM examples then 80 would seem to indicate a reverse mapped single end read. I'd check that you definitely ran the search as a paired end search since the output doesn't suggest this. The format says that flags relating to mate pairs only have any meaning when 0x01 is true (ie the read is paired), so the flag for first read in a pair isn't necessarily wrong.

                  The example in the SAM tools format definintion shows two reads which form a pair with flags 99 and 147.

                  99 = 01100011
                  147 = 10010011

                  99 = Paired, Proper Pair, Mapped, Mate Mapped, Forward, Mate Reverse, First in pair, Not second in pair

                  147 = Paired, Proper Pair, Mapped, Mate Mapped, Reverse, Mate Forward, Not first in pair, Second in pair

                  Which all makes perfect sense.

                  Comment


                  • #10
                    Originally posted by aurelielaugraud View Post
                    EAS89_20:1:22:1316:1797 80 chr1 10003 0 50M * 0 0
                    Your mate pair name is * and the mate position and insert size are both 0 indicating you have not done a paired end mapping (which agrees with your flag reading).

                    Comment


                    • #11
                      The flag number is decimal value, I wrote a small tool to convert decimal to Hex and binary string, but it was rarely used. Firstly, you can easily identify if this read is paired (even: SE ; odd: PE), set this at the beginning:
                      if($FLAG & 0x01)
                      $pe_flag = 1
                      else
                      $se_flag = 1

                      Then, suppose this is PE library, someone might want to know if one read is the first read in a pair or second read in a pair (indicated in export file of illumina data). You can do this

                      if ($FLAG & 0x40)
                      $1st_flag = 1

                      This is just a simple example, you can extract different info by operating on different bits of FLAG.

                      Comment


                      • #12
                        ok thanks for all the replies
                        I haven't run the mapping as paired-end . I ran only one file (and it took long enough). I thought I could just try it like that ... and I am used of bwa or gem where I can map the two paired-end files separately which doesn't seem to be the case with maq.(or did I miss something ?)
                        Anyway so if I understand correctly. as it hasn't been mapped as paires the first two bits are 0 which are ok with me but it somehow knows that it is a paired-end sample ( probably the /1 at the end of the id ??) and it tells me that it is the first read of the pair ... even if it hasn't any pair here.
                        I just thought I would automatically put this bit to 0 as there is no paired-end on the file I gave it.
                        99 and 147 is what I get with my bwa try but I made the effort of mapping both paired-end files .

                        Another thing, is there anyway to know if the output file of maq or bwa is compete (what is for some reason it stops beforehand ) ? i used to count lines for gem output but as bwa and maq have binary compressed format, I don't know if I can check that .

                        thank you

                        Comment


                        • #13
                          Originally posted by aurelielaugraud View Post
                          Another thing, is there anyway to know if the output file of maq or bwa is compete (what is for some reason it stops beforehand ) ? i used to count lines for gem output but as bwa and maq have binary compressed format, I don't know if I can check that .

                          Output of maq is map file, use maq mapview to make it visible and count the lines which can be done by unix command. The output of bwa is sam file which is human readable, you should be able to open it, but in case you generate the bam file, use samtools view to read it, and similarly to get the line number.

                          Comment


                          • #14
                            ok I knew that but it takes quite a long time and disk space to make the file human readable ... doesn't matter, i will try and see ...
                            thanks

                            Comment


                            • #15
                              Originally posted by aurelielaugraud View Post
                              ok I knew that but it takes quite a long time and disk space to make the file human readable ... doesn't matter, i will try and see ...
                              thanks
                              You are right, sam file is quite large size, from my simulation, the sam file is generally four times larger than corresponding bam file. My idea is to write a script to generate sam file and further the bam file, and in that script delete the sam file right after the bam file generated.

                              To view that bam file, use samtools view test.bam, then you will be able to see the content. On the other hand, you will also be able to output part of bam to sam file by unix command for your parsing.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X