Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • baohua100
    Senior Member
    • Jun 2008
    • 103

    maq multiple hits information?

    How can i know which reads are multiple hit in maq output?

    SOLEXA8_20:5:8:387:500/2 LG_I 2333 + 0 32 0 0 0 4 39 0
    16 36 aaaccccaaaccctaaaccctgaaccctaaagccac 8888888887888888.888778.887.87+%10)%


    SOLEXA8_20:5:89:1317:1949/2 LG_I 10793 + 208 18 10 10 10 3 69 0
    0 36 ttttttagtctgttatgctacatgaaatgcaaaatt +++88888888888888888888888888822222/

    i want to know which reads are unique mapped , which reads are not unique mapped
    Last edited by baohua100; 08-05-2009, 11:09 PM.
  • totalnew
    Member
    • Apr 2009
    • 46

    #2
    One way is to convert .map file to .sam file, the TAG field of sam file will tell you unique match info.

    Comment

    • baohua100
      Senior Member
      • Jun 2008
      • 103

      #3
      bwa sampe ../../genome/genome.fa aln_sa1.sai aln_sa2.sai 4_1.fq 4_2.fq > pairs.sam

      also a [1]+ Segmentation fault

      Comment

      • totalnew
        Member
        • Apr 2009
        • 46

        #4
        Hard to tell what was happening, but you may try different insert size by

        bwa sampe -a 500 ref.fa aln_sa1.sai aln_sa2.sai 4_1.fq 4_2.fq > pairs.sam
        .
        .
        .
        bwa sampe -a 100 ref.fa aln_sa1.sai aln_sa2.sai 4_1.fq 4_2.fq > pairs.sam

        This may not be a solution, but no harm to have a try.

        Comment

        • baohua100
          Senior Member
          • Jun 2008
          • 103

          #5
          SOLEXA9_23:7:96:100:1864 83 LG_I 62090 69 50M * 0 -231 GGAGGAGGAAGATCGAGGAGGAGGATGGCGTAATGGAATTTTGGTTTAGG %&.......-..&..22222878882888828858888584888888888 MF:i:18 AM:i:33 SM:i:36 NM:i:2 UQ:i:9 H0:i:1 H1:i:0

          how can i know this read is unique mapping ?

          H0 is perfect hit number , but why NM is 2

          Comment

          • baohua100
            Senior Member
            • Jun 2008
            • 103

            #6
            SOLEXA9_23:7:79:53:135 67 LG_I 10790 39 50M * 0 192 TTTTTTTTTAGTCTGTTATGCTACATGAAATGCAAAATTCCATTAATTGC 88888888888888888888888888898222222.........-....- MF:i:130 AM:i:39 S
            M:i:0 NM:i:0 UQ:i:0 H0:i:0 H1:i:0

            The NM (mismatch number) is 0, so the H0 (perfect hit) is at least 1, but why H0 is 0?

            Comment

            • baohua100
              Senior Member
              • Jun 2008
              • 103

              #7
              Originally posted by totalnew View Post
              One way is to convert .map file to .sam file, the TAG field of sam file will tell you unique match info.
              If only conver the maq format to sam format can tell me the unique mapping info, then I should known this info from the maq output ?

              Comment

              • totalnew
                Member
                • Apr 2009
                • 46

                #8
                Firstly, this sam file is not obtained from bwa running, since bwa generates the optional field which are specific to bwa (start with 'x'). My bwa generated sam files confirm that.

                X0: number of best hits (could be hits with mismatches)
                X1: number of suboptimal hits found by bwa
                NM: edit distance of best hit

                By default, if best hit is unique, bwa will finds all hits contains one more mismatch, if the best hit is repetitive, bwa finds all equally best hits. For example, NM:i:2 X0:i:1 X1:i:10, in this case, the best hit contains 2 mismatches, and it is unique, while suboptimal hits number is 10 with 3 mismatches. On the other hand, if NM:i:2 X0:i:5 X1:i:0, the best hits number is more than 1, the best hit is not unique. I assume X0 is always greater than 0 (need to confirm?).

                For the sam file not generated by bwa (for instance, converted from maq file),
                H0: number of perfect hits
                H1: number of 1-difference hits
                NM: edit distance of best hits

                It is bit of confusing, orginally I thought perfect hit is the hit with 0-mismatches, but it might not be true when I look up the sam file converted from maq file, here is an example:

                One read in map file
                SOLEXA3_92:8:94:116:1145/1 LG_I 10811 + 2550 2 74 74 47 1 30 1 0 50 TaCATGaAATGCAAAATTcCATTAATTGCAATTGGTCGGACAAGTTTCAC @:@AAA6B?ACB@@?@BC:CBCC?CCCCBBBCAB@@BBC>AACCABCBCB

                its converted sam format:
                SOLEXA3_92:8:94:116:1145 65 LG_I 10811 74 50M * 0 2550 TACATGAAATGCAAAATTCCATTAATTGCAATTGGTCGGACAAGTTTCAC @:@AAA6B?ACB@@?@BC:CBCC?CCCCBBBCAB@@BBC>AACCABCBCB MF:i:2 AM:i:47 NM:i:1 UQ:i:30 H0:i:1 H1:i:0

                Here the perfect hit is unique (H0:i:1), but the NM is 1 which means this perfect hit has 1 mismatch. I also observed follow reads in the sam file,

                MF:i:32 AM:i:47 NM:i:2 UQ:i:60 H0:i:0 H1:i:1
                MF:i:32 AM:i:27 NM:i:2 UQ:i:38 H0:i:0 H1:i:0
                MF:i:18 AM:i:43 NM:i:3 UQ:i:43 H0:i:0 H1:i:1

                Therefore, I may have to override my assumption again base on above reads. So I hope anyone can tell me how to determine the relationship among NM, H0 and H1. Comments and protests are welcome.

                thanks
                Last edited by totalnew; 08-06-2009, 11:55 AM.

                Comment

                • totalnew
                  Member
                  • Apr 2009
                  • 46

                  #9
                  Originally posted by baohua100 View Post
                  If only conver the maq format to sam format can tell me the unique mapping info, then I should known this info from the maq output ?
                  Yes, you should be able to know, since they are actually same. NM in sam is obtained from field "number of mismatches of the best hit", H0 is from field "number of 0-mismatch hits of the first 24bp", H1 is from field "number of 1-mismatch hits of the first 24bp on the reference". The field definitions are in maq manual.

                  Again, if H0 is 0-matches hits rather than best hits, we can't explain the tag fields like
                  MF:i:2 AM:i:47 NM:i:1 UQ:i:30 H0:i:1 H1:i:0

                  Comment

                  • baohua100
                    Senior Member
                    • Jun 2008
                    • 103

                    #10
                    maq map -H

                    -H FILE dump multiple/all 01-mismatch hits to FILE [null]

                    But I could not open the output file maybe it's binary ?

                    Comment

                    • totalnew
                      Member
                      • Apr 2009
                      • 46

                      #11
                      Originally posted by baohua100 View Post
                      maq map -H

                      -H FILE dump multiple/all 01-mismatch hits to FILE [null]

                      But I could not open the output file maybe it's binary ?
                      I don't get your point.

                      If you want to view maq output, use maq mapview .....
                      Last edited by totalnew; 08-06-2009, 01:12 PM.

                      Comment

                      • jnfass
                        Member
                        • Aug 2008
                        • 88

                        #12
                        the file produced by using the '-H' option is gzipped ... maq docu's don't say this anywhere that I could find, unfortunately. Just name it something.gz and gunzip it ...

                        Comment

                        • baohua100
                          Senior Member
                          • Jun 2008
                          • 103

                          #13
                          So maq can not tell anything about unique hit or multiple hit ?

                          Just a mapping quality score ?


                          How we choose use which mapping results, for later analysis
                          Last edited by baohua100; 08-12-2009, 06:57 PM.

                          Comment

                          • baohua100
                            Senior Member
                            • Jun 2008
                            • 103

                            #14
                            the file produced by using the '-H' option ,

                            This file contains multiple hit reads? or ?

                            Comment

                            • nilshomer
                              Nils Homer
                              • Nov 2008
                              • 1283

                              #15
                              Originally posted by baohua100 View Post
                              So maq can not tell anything about unique hit or multiple hit ?

                              Just a mapping quality score ?


                              How we choose use which mapping results, for later analysis
                              If there are multiple best hits, then the mapping quality is zero (it is ambiguous). So at least use a mapping quality of one. I find, through simulations, a MAQ mapping quality of 20 is sufficient. Note to others: different aligners approximate MAPQ differently so test through simulations.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 05:37 AM
                              0 responses
                              5 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              50 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              109 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...