Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove reads which are not uniquely mapped

    I have already had the BAM file, so how can I remove those reads which are not uniquely mapped? Can samtools do this? All some other software?

  • #2
    Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.

    Comment


    • #3
      Please define what uniquely mapped means to you.

      Comment


      • #4
        Originally posted by nilshomer View Post
        Please define what uniquely mapped means to you.
        Reads that can be hit only once in the genome - I know there is some cutoff for this hit, but generally, how to remove those have mutilple hits?

        Comment


        • #5
          Originally posted by Heisman View Post
          Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.
          That is the reason I asked because I could not find it.

          Comment


          • #6
            I don't like this definition, since suppose a mapper X tries harder to find a hit than mapper Y, then mapper X is most likely more sensitive and specific, but will have fewer reads that only had one hit, even though a hit for a read may be much more likely than all other hits.

            That's why you should look at mapping quality. Those with mapping quality zero are ambiguous: multiple equally best alignments were found. But those with higher quality should have higher confidence.

            Comment


            • #7
              In other words, use samtools view -q 1 on the .bam to get reads with a mapping quality of at least 1. Depending on your application and aligner you may want to use something like -q 20 to get more reliable hits.

              Comment


              • #8
                Originally posted by Heisman View Post
                Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.
                If the SAM/BAM file has secondary mappings recorded (which not all mapping software will do), then yes, you can filter them out using the FLAG bit values. However I'd recommend using 'samtools view' with the -f and/or -F switches rather than grep.

                Comment


                • #9
                  If your aligner properly sets the MAPQ field, you can filter on this. After all, if there are two equally plausible alignments, the probability for each alignment to be correct is at most 50%, which transforms to a Phred score of 3. Hence, an aligner should never indicate a mapping quality above 3 for multiply matched reads.

                  Furthermore, many aligners follow the recommendation to use the optional tag "NH" to indicate how many alignments are reported for this read. This helps only, of course, if you instructed the aligner to report multiple alignments.

                  Comment


                  • #10
                    Originally posted by Simon Anders View Post
                    If your aligner properly sets the MAPQ field, you can filter on this. After all, if there are two equally plausible alignments, the probability for each alignment to be correct is at most 50%, which transforms to a Phred score of 3. Hence, an aligner should never indicate a mapping quality above 3 for multiply matched reads.
                    What about paired end sequencing? Uniqueness of a read alignment depends also on its mate position. One mate could map 10 times on the genome but only 1 position is valid considering its mate. How can you filter from a bowtie/bwa generated bam file only uniquely mapped paire end reads?
                    Don't want considering only "concordant" reads, since i would like to retain paired reads that map in the same region but with discordant orientation (if stranded)

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    17 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    48 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X