I have been trying to run BWA MEM on my recently generated Nextera Mate Pair data in order to directly compare it to Novoalign. However, the output immediately shows some problems with the orientation and insert size calculations - only showing FF & RR reads and with very small insert sizes (20bp). To test a few scenarios, I took only the first 100 read pairs. The library was size selected for 3-5kb.
I thought I remembered reading that BWA had a problem with the orientation of Illumina mate pair so I did a reverse complement of both reads using seqtk which gave identical results.
For comparison, using novoalign I got the following:
Any ideas? I'm stumped. I've been happy with novoalign but since I found this potential bug, I thought I'd let the community know (or let me know the option I missed in BWA).
Code:
$ bwa mem -t 1 -M -R "@RG\tID:ID\tSM:Sample\tLB:Nextera_3-5" ../../hg19.fa R1.temp.fastq R2.temp.fastq > ../bwamem/temp.sam [M::main_mem] read 200 sequences (20000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (42, 0, 0, 37) [M::mem_pestat] analyzing insert size distribution for orientation FF... [M::mem_pestat] (25, 50, 75) percentile: (15, 23, 30) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 60) [M::mem_pestat] mean and std.dev: (21.64, 10.61) [M::mem_pestat] low and high boundaries for proper pairs: (1, 75) [M::mem_pestat] skip orientation FR as there are not enough pairs [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation RR... [M::mem_pestat] (25, 50, 75) percentile: (18, 32, 64) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 156) [M::mem_pestat] mean and std.dev: (41.49, 33.13) [M::mem_pestat] low and high boundaries for proper pairs: (1, 202) [M::worker2@0] performed mate-SW for 376 reads [main] Version: 0.7.5a-r405 [main] CMD: bwa mem -t 1 -M -R @RG\tID:ID\tSM:Sample\tLB:Nextera_3-5 ../../hg19.fa R1.temp.fastq R2.temp.fastq
For comparison, using novoalign I got the following:
Code:
$ novoalign -d hg19_novoindex -f R1.temp.fastq R2.temp.fastq -i MP 4000,2500 -oSAM "@RG\tID:ID\tSM:Sample\tLB:Nextera_3-5" > ../bwamem/temp.sam # novoalign (V2.08.05) # Paired Reads: 100 # Pairs Aligned: 44 # Read Sequences: 200 # Aligned: 135 # Unique Alignment: 132 # Gapped Alignment: 25 # Quality Filter: 2 # Homopolymer Filter: 0 # Mean 4061, Std Dev 583.4
Comment