Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NH:n:x in BAM output

    Dear all,

    the option specified for the number of hits in a SAM output, NH, is amazingly missing from the two most popular aligners (namely BWA and Bowtie). However, Tophat2 reports the NH flawlessly, therefore there must be a way to run Bowtie(2) with an option to generate this report.

    So far, too much of the community is resorting to the samtools view -q3 trick, which is able to extract alignments with a mapping quality below 50% (by definition, a multiple hitting read will have all its n equally good alignments with a score of at most 1/n).

    Thanks for any hints. I hope this post will become viral.

    Federico Giorgi (CUMC) & Davide Scaglione (IGA)
    Last edited by giorgifm; 04-22-2013, 11:00 AM. Reason: would -> will - to strengthen the message

  • #2
    Reading the Bowtie2 manual, it seems that the AS:i: flag reports the score for the best alignment, while XS:i the score for the second-best one. Thus one could compare these two value to figure out wheather a read is mapping ambiguosly or not.

    Still, I'm really frustated not being able to control the maximum number of mismatches and/or gaps as tophat2 easily does with two dedicated options.

    Cheers

    Davide

    Comment


    • #3
      To remove multiple hit reads, I am also using
      samtools view -q4 -F4 file.bam

      Following the explanation given by this post:

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc



      I concur that the lack of an NH:i:n field in BWA, and especially in Bowtie is unbelievable, I see no reason why the programmers omitted this field. If it weren't for "standards" and complaining reviewers, I would switch to other aligners like ERNE at once

      Comment


      • #4
        Originally posted by Gianza View Post
        while XS:i the score for the second-best one.
        Dear Gianza, so if an alignment row contains the XS field, the read is by definition aligning in more than one place?

        Or rather, if the reported AS score is equal to the Q field of the BAM, then we are looking at (one of) the best hits? In this case the problem remains, and it would be simpler to just count the number of reads/fragments and check for their Q scores.

        So what I believe now is that the only way to obtain the NH is writing an addNH script, that processes the entire readname-sorted BAM to add the NH field based ont he Qs and the FLAGs contained within the BAM itseld

        Comment


        • #5
          Originally posted by peer.b View Post
          To remove multiple hit reads, I am also using
          samtools view -q4 -F4 file.bam

          Following the explanation given by this post:

          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc
          I was aware of the solution by Simon Anders, and I pray the gods every day for its existence. However, -F4 -q4 may remove primary, unambiguous but low quality mappings, and therefore this is an efficient but patchy solution

          Comment


          • #6
            I agree that this problem needs addressing ..

            Comment


            • #7
              Originally posted by giorgifm View Post
              Dear all,

              ... Tophat2 reports the NH flawlessly, therefore there must be a way to run Bowtie(2) with an option to generate this report.

              Federico Giorgi (CUMC) & Davide Scaglione (IGA)
              I am not sure it reports it flawlessly. When tuning the -g (max-multihits) option, the number of NH:i:>1 changes, and this seems wrong to me.

              From the TopHat2 manual:
              "If there are more alignments with the same score than this number, TopHat will randomly report only this many alignments"

              But not reporting them should not mean that they are not there, as align_summary.txt seems to understand. However, in my case (TopHat2.0.9) all reads with multiple mappings > -g are reported with MAPQ50 and NH:i:1

              Comment


              • #8
                Maybe my last sentence was not right after all. It seems that NH:i:1 and MAPQ50 are constant when changing -g (except if -g, in which case all the reads are (erroneously) reported as NH:i:1 and MAPQ50). However, NH:i (>1) and MAPQ (<50) counts do not coincide with align_summary multiple mapping reads number. I would really like to understand align_summary report.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  The Impact of AI in Genomic Medicine
                  by seqadmin



                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                  02-26-2024, 02:07 PM
                • seqadmin
                  Multiomics Techniques Advancing Disease Research
                  by seqadmin


                  New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                  A major leap in the field has
                  ...
                  02-08-2024, 06:33 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 02-28-2024, 06:12 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-23-2024, 04:11 PM
                0 responses
                74 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-21-2024, 08:52 AM
                0 responses
                84 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-20-2024, 08:57 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X