Seqanswers Leaderboard Ad

**lindenb** · 06-17-2014, 07:29 AM

I wrote a tool to compare two BAMS: https://github.com/lindenb/jvarkit/wiki/CmpBams

**kmcarr** · 06-17-2014, 10:08 AM

Hi Jullee,

Here is how I would do it:

Code:

samtools view -F 4 Ref1.bam | cut -f1 | sort -u > Ref1_reads.txt
samtools view -F 4 Ref2.bam | cut -f1 | sort -u > Ref2_reads.txt

comm -12 Ref1_reads.txt Ref2_reads.txt > AlignToBoth.txt
comm -23 Ref1_reads.txt Ref2_reads.txt > AlignToRef1Only.txt
comm -13 Ref1_reads.txt Ref2_reads.txt > AlignToRef2Only.txt

Let me break this down:

The samtools command is converting the the BAM to SAM format but filtering out any reads which are not aligned to the reference (-F 4).

You should not output the header information (-h) since it is meaningless for your purposes (and you got rid of it later in your pipeline anyway).

I do not bother saving the intermediate .sam file since I really don't need to, just pipe the output into the next step.

Just take the read ID in column 1 with cut. (You used awk for this but whatever you like.)

Sort the ID's and only keep one, unique copy of each (-u). Save the output.

You were concerned about the order output by sort; don't be. sort in this case is outputting the read IDs in lexical (dictionary) order. This is the order which is required by comm.

The comm commands are pretty straightforward, outputting just one of the three columns in each run. You could run comm just once and then separate the columns but I find this method a little easier.

ONE BIG NOTE: This pipeline will collapse the paired read IDs into a single entry meaning that only one of the reads from a pair need align to either of the references to be counted.

**jullee** · 06-18-2014, 04:28 AM

Thank you kmcarr and lindenb for the helpful replies! I will try these out.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

How to compare Read IDs from different Bam files

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News