Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove reads which are not uniquely mapped

    I have already had the BAM file, so how can I remove those reads which are not uniquely mapped? Can samtools do this? All some other software?

  • #2
    Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.

    Comment


    • #3
      Please define what uniquely mapped means to you.

      Comment


      • #4
        Originally posted by nilshomer View Post
        Please define what uniquely mapped means to you.
        Reads that can be hit only once in the genome - I know there is some cutoff for this hit, but generally, how to remove those have mutilple hits?

        Comment


        • #5
          Originally posted by Heisman View Post
          Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.
          That is the reason I asked because I could not find it.

          Comment


          • #6
            I don't like this definition, since suppose a mapper X tries harder to find a hit than mapper Y, then mapper X is most likely more sensitive and specific, but will have fewer reads that only had one hit, even though a hit for a read may be much more likely than all other hits.

            That's why you should look at mapping quality. Those with mapping quality zero are ambiguous: multiple equally best alignments were found. But those with higher quality should have higher confidence.

            Comment


            • #7
              In other words, use samtools view -q 1 on the .bam to get reads with a mapping quality of at least 1. Depending on your application and aligner you may want to use something like -q 20 to get more reliable hits.

              Comment


              • #8
                Originally posted by Heisman View Post
                Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.
                If the SAM/BAM file has secondary mappings recorded (which not all mapping software will do), then yes, you can filter them out using the FLAG bit values. However I'd recommend using 'samtools view' with the -f and/or -F switches rather than grep.

                Comment


                • #9
                  If your aligner properly sets the MAPQ field, you can filter on this. After all, if there are two equally plausible alignments, the probability for each alignment to be correct is at most 50%, which transforms to a Phred score of 3. Hence, an aligner should never indicate a mapping quality above 3 for multiply matched reads.

                  Furthermore, many aligners follow the recommendation to use the optional tag "NH" to indicate how many alignments are reported for this read. This helps only, of course, if you instructed the aligner to report multiple alignments.

                  Comment


                  • #10
                    Originally posted by Simon Anders View Post
                    If your aligner properly sets the MAPQ field, you can filter on this. After all, if there are two equally plausible alignments, the probability for each alignment to be correct is at most 50%, which transforms to a Phred score of 3. Hence, an aligner should never indicate a mapping quality above 3 for multiply matched reads.
                    What about paired end sequencing? Uniqueness of a read alignment depends also on its mate position. One mate could map 10 times on the genome but only 1 position is valid considering its mate. How can you filter from a bowtie/bwa generated bam file only uniquely mapped paire end reads?
                    Don't want considering only "concordant" reads, since i would like to retain paired reads that map in the same region but with discordant orientation (if stranded)

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    51 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X