Seqanswers Leaderboard Ad

**vivek_** · 10-26-2012, 09:19 AM

You can use bam flags to do this filtering.

Here is a webpage with some good information on BAM flags:

http://davetang.org/wiki/tiki-index.php?page=SAMTools#SAMTools_calmd_fillmd

Dave's Wiki

INTERPRETING THE BAM FLAGS

The second column in a SAM/BAM file is the flag column. They may seem confusing at first but the encoding allows details about a read to be stored by just using a few digits. The trick is to convert the numerical digit into binary, and then use the table to interpret the binary numbers, where 1 = true and 0 = false.

Here are some common BAM flags:

163: 10100011 in binary
147: 10010011 in binary
99: 1100011 in binary
83: 1010011 in binary

Interpretation of 10100011 (reading the binary from left to right):

1 the read is paired in sequencing, no matter whether it is mapped in a pair
1 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)
0 the query sequence itself is unmapped
0 the mate is unmapped
0 strand of the query (0 for forward; 1 for reverse strand)
1 strand of the mate
0 the read is the first read in a pair
1 the read is the second read in a pair

**jflowers** · 10-26-2012, 10:37 AM

Hi vivek,

BAM flags won't work for this. The information about the paired read does not tell you anything about whether the read is still in the file. It only contains information about its mapping properties.

**swbarnes2** · 10-26-2012, 12:25 PM

If not, I was thinking to sort on readname, cook up something to identify singletons, dump names of singletons to file, remove reads using Picard FilterSamReads. Other ideas?

I think that's what you'll have to do.

Maybe you can go back and confirm that MarkDuplicates was treating your reads as paired end, and not single end? Maybe that was the problem.

Or, try filtering your orignal file to only have reads where both ends mapped, then MarkDuplictes. Maybe that's why MarkDuplicates didn't mark both reads.

**jflowers** · 10-26-2012, 12:48 PM

Hi swbarnes2,

Here is my output in the MarkDuplicates metrics file:

## METRICS CLASS net.sf.picard.sam.DuplicationMetrics
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICAT
ES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
CR503-1 4628209 124378530 7792909 3928406 30212247 83840 0.253973 213024317

It certainly looks like MarkDups detected paired ends. The UNPAIRED_READS_EXAMINED and UNPAIRED_READ_DUPLICATES are the classes in question. I had always interpreted these to be cases where one read mapped and the other didn't. In any event, if I were to guess the UNPAIRED_READ_DUPLICATES are cases where a read, whose mate was unmapped, was removed because it mapped to the exact same coordinates as other reads.

If this looks unusual I would appreciate feedback, but my guess is that the expected behavior is that MarkDuplicates will leave some orphan unmapped reads when REMOVE_DUPLICATES=true.

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

remove reads from BAM whose mate has already been filtered

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News