I want to count the number of (single-end) reads that were mapped to approximately the same coordinates by different aligners.
The problem is that the reads do not have identical IDs and may have shifted coordinates in a range of 1 bp (SOLiD mapped with BWA), for example:
BWA:
prefix_3_30_738 0 chr8 11162354 37 48M * 0 0 ...
ABI BioScope:
3_30_738 0 chr8 11162353 100 50M * 0 0 ...
NovoalignCS:
3_30_738_F3 0 chr8 11162353 150 50M * 0 0 ...
Reads are in sorted, indexed BAM files. Of course I could change the read IDs and coordinates to find exact matches with Picard CompareSAMS, but I'd like to avoid redundance,
reduce computational time and also output the matching reads. Besides, I'm interested in finding reads that may be aligned in a certain neighborhood.
Has anyone already developed a tool that can handle such an issue? If not, what would be the most efficient strategy?
Thank you for advice in advance!
Barbara
The problem is that the reads do not have identical IDs and may have shifted coordinates in a range of 1 bp (SOLiD mapped with BWA), for example:
BWA:
prefix_3_30_738 0 chr8 11162354 37 48M * 0 0 ...
ABI BioScope:
3_30_738 0 chr8 11162353 100 50M * 0 0 ...
NovoalignCS:
3_30_738_F3 0 chr8 11162353 150 50M * 0 0 ...
Reads are in sorted, indexed BAM files. Of course I could change the read IDs and coordinates to find exact matches with Picard CompareSAMS, but I'd like to avoid redundance,
reduce computational time and also output the matching reads. Besides, I'm interested in finding reads that may be aligned in a certain neighborhood.
Has anyone already developed a tool that can handle such an issue? If not, what would be the most efficient strategy?
Thank you for advice in advance!
Barbara
Comment