Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • remove reads from BAM whose mate has already been filtered

    Hi,

    I have removed duplicates from a paired end run BAM using Picard MarkDuplicates. In some cases, a single read was retained (not exactly sure why, perhaps the retained read was unmapped, but my BAM no longer has an even number of reads). No other filtering was done.

    For some downstream methods (e.g., bedtools pairtobed) I need to have a BAM where both reads are present for each fragment and no "singletons" of this type are present.

    Is there an available method to remove such singleton reads?

    If not, I was thinking to sort on readname, cook up something to identify singletons, dump names of singletons to file, remove reads using Picard FilterSamReads. Other ideas?

  • #2
    You can use bam flags to do this filtering.

    Here is a webpage with some good information on BAM flags:



    INTERPRETING THE BAM FLAGS


    The second column in a SAM/BAM file is the flag column. They may seem confusing at first but the encoding allows details about a read to be stored by just using a few digits. The trick is to convert the numerical digit into binary, and then use the table to interpret the binary numbers, where 1 = true and 0 = false.

    Here are some common BAM flags:

    163: 10100011 in binary
    147: 10010011 in binary
    99: 1100011 in binary
    83: 1010011 in binary

    Interpretation of 10100011 (reading the binary from left to right):

    1 the read is paired in sequencing, no matter whether it is mapped in a pair
    1 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)
    0 the query sequence itself is unmapped
    0 the mate is unmapped
    0 strand of the query (0 for forward; 1 for reverse strand)
    1 strand of the mate
    0 the read is the first read in a pair
    1 the read is the second read in a pair

    Comment


    • #3
      Hi vivek,

      BAM flags won't work for this. The information about the paired read does not tell you anything about whether the read is still in the file. It only contains information about its mapping properties.

      Comment


      • #4
        If not, I was thinking to sort on readname, cook up something to identify singletons, dump names of singletons to file, remove reads using Picard FilterSamReads. Other ideas?
        I think that's what you'll have to do.

        Maybe you can go back and confirm that MarkDuplicates was treating your reads as paired end, and not single end? Maybe that was the problem.

        Or, try filtering your orignal file to only have reads where both ends mapped, then MarkDuplictes. Maybe that's why MarkDuplicates didn't mark both reads.

        Comment


        • #5
          Hi swbarnes2,

          Here is my output in the MarkDuplicates metrics file:

          ## METRICS CLASS net.sf.picard.sam.DuplicationMetrics
          LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICAT
          ES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
          CR503-1 4628209 124378530 7792909 3928406 30212247 83840 0.253973 213024317

          It certainly looks like MarkDups detected paired ends. The UNPAIRED_READS_EXAMINED and UNPAIRED_READ_DUPLICATES are the classes in question. I had always interpreted these to be cases where one read mapped and the other didn't. In any event, if I were to guess the UNPAIRED_READ_DUPLICATES are cases where a read, whose mate was unmapped, was removed because it mapped to the exact same coordinates as other reads.

          If this looks unusual I would appreciate feedback, but my guess is that the expected behavior is that MarkDuplicates will leave some orphan unmapped reads when REMOVE_DUPLICATES=true.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Addressing Off-Target Effects in CRISPR Technologies
            by seqadmin






            The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
            08-27-2024, 04:44 AM
          • seqadmin
            Selecting and Optimizing mRNA Library Preparations
            by seqadmin



            Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
            08-07-2024, 12:11 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 08-27-2024, 04:40 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 08-22-2024, 05:00 AM
          0 responses
          293 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 08-21-2024, 10:49 AM
          0 responses
          135 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 08-19-2024, 05:12 AM
          0 responses
          124 views
          0 likes
          Last Post seqadmin  
          Working...
          X