I'm trying to align a whole genome sample using bwa-mem. I've previously aligned using bwa aln and bwa sampe and it went through fine. The command I'm using is:
bwa mem -t 8 -R '@RG\tID:Label\tLB:Label\tSM:Label' reference_v37.fasta read_1.fq.gz read_2.fq.gz > Sample.sam
Followed by:
samtools import reference_v37.fasta.fai Sample.sam Sample_unsorted.bam
samtools sort Sample_unsorted.bam Sample_sorted
The error I got from picard MarkDuplicates was the following:
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: 0749_7455:HS2000-1111A_168:5:1203:11571:84654
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:294)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)
I found the offending reads from the bam file:
HS2000-1111A_168:5:1203:11571:84654 81 2 243164263 1 100M 1 231669 0 AGCCTATGTGATGACTACATGTCGTGCGGGATCCTGGATGGGATCCTGGGTCAGAGTAAGATAGAACTAAGGGAATCCAAATGAAATATGAACTTTAGTT DDDDDCCCCDEEDDCEDDB?CDEE8IJHFBIJIGBBJIHDHGHGF=JIIG@HIJIJJGJJIGDIHEB?IGIHIHEEGJJJHEGHEG>DHFDADDBFD@C@ NM:i:0 AS:i:100 XS:i:95 RG:Z:0749_7455
HS2000-1111A_168:5:1203:11571:84654 161 1 231669 0 51M49S 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:51 XS:i:51 RG:Z:0749_7455 XP:Z:1,+231860,51S49M,0,0;
HS2000-1111A_168:5:1203:11571:84654 161 1 231860 0 51S49M 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:49 XS:i:49 RG:Z:0749_7455 XP:Z:1,+231669,51M49S,0,0;
It seems that bwa-mem aligned the second of the pair twice. In addition, I got an insert size of zero. From a post a few years back (somewhere) I read that this indicates the alignment is unpaired or the mate reference ID is invalid.
I've tried using FixMateInformation, which shuffled some of the information around but still left me with two mappings of what appears to be the same read.
Anyone able to shed some light on this? I did a search but found very few threads related to bwa-mem, as it's a relatively new tool.
Myron Peto
bwa mem -t 8 -R '@RG\tID:Label\tLB:Label\tSM:Label' reference_v37.fasta read_1.fq.gz read_2.fq.gz > Sample.sam
Followed by:
samtools import reference_v37.fasta.fai Sample.sam Sample_unsorted.bam
samtools sort Sample_unsorted.bam Sample_sorted
The error I got from picard MarkDuplicates was the following:
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: 0749_7455:HS2000-1111A_168:5:1203:11571:84654
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:294)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)
I found the offending reads from the bam file:
HS2000-1111A_168:5:1203:11571:84654 81 2 243164263 1 100M 1 231669 0 AGCCTATGTGATGACTACATGTCGTGCGGGATCCTGGATGGGATCCTGGGTCAGAGTAAGATAGAACTAAGGGAATCCAAATGAAATATGAACTTTAGTT DDDDDCCCCDEEDDCEDDB?CDEE8IJHFBIJIGBBJIHDHGHGF=JIIG@HIJIJJGJJIGDIHEB?IGIHIHEEGJJJHEGHEG>DHFDADDBFD@C@ NM:i:0 AS:i:100 XS:i:95 RG:Z:0749_7455
HS2000-1111A_168:5:1203:11571:84654 161 1 231669 0 51M49S 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:51 XS:i:51 RG:Z:0749_7455 XP:Z:1,+231860,51S49M,0,0;
HS2000-1111A_168:5:1203:11571:84654 161 1 231860 0 51S49M 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:49 XS:i:49 RG:Z:0749_7455 XP:Z:1,+231669,51M49S,0,0;
It seems that bwa-mem aligned the second of the pair twice. In addition, I got an insert size of zero. From a post a few years back (somewhere) I read that this indicates the alignment is unpaired or the mate reference ID is invalid.
I've tried using FixMateInformation, which shuffled some of the information around but still left me with two mappings of what appears to be the same read.
Anyone able to shed some light on this? I did a search but found very few threads related to bwa-mem, as it's a relatively new tool.
Myron Peto
Comment