Hi everyone. We are having difficulty merging BAM files. Our pipeline so far is this:-
50bp single-end rna-seq reads - map to hg19 with Bowtie, writing unaligned reads to a new file and writing mapped reads to a SAM file. Then map unaligned Bowtie reads with TopHat and write to a BAM file.
We now want to merge the two output files, but there is a problem with the way the reads are sorted. We have tried (1) samtools sort on both files, then samtools reheader and samtools merge, (2) sorting files as per command on cufflinks manual page (sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted) and then merging, (3) picard sort and merge commands, and yet none of these will actually merge the files.
An example of the picard input / output is here:
The error seems to be this:
Exception in thread "main" net.sf.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 1 dont match: 1/243199373/chr2 1/135534747/chr10
I think the problem lies somewhere with the way bowtie and tophat sort their output by default. Bowtie sorts numerically, chr1, chr2, chr3... etc whereas TopHat sorts chr1, chr10, chr11......chr19, chr2, chr20, chr21 etc. But we have sorted both files again using picard SortSam so we cannot understand why this discrepancy is not corrected. (Same happens with samtools sort).
Can anyone help us please?
Many thanks
Helen
50bp single-end rna-seq reads - map to hg19 with Bowtie, writing unaligned reads to a new file and writing mapped reads to a SAM file. Then map unaligned Bowtie reads with TopHat and write to a BAM file.
We now want to merge the two output files, but there is a problem with the way the reads are sorted. We have tried (1) samtools sort on both files, then samtools reheader and samtools merge, (2) sorting files as per command on cufflinks manual page (sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted) and then merging, (3) picard sort and merge commands, and yet none of these will actually merge the files.
An example of the picard input / output is here:
PHP Code:
java -Xmx2g -Dsnappy.disable=true -jar /path/to/picard-tools-1.52/SortSam.jar INPUT=bowtie.sam OUTPUT=bowtie_picardsort.bam SORT_ORDER=coordinate
[Fri Oct 07 15:48:34 BST 2011] net.sf.picard.sam.SortSam INPUT=bowtie.sam OUTPUT=bowtie_picardsort.bam SORT_ORDER=coordinate VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Fri Oct 07 15:48:34 BST 2011] Executing as swebioinf@SWEBIOINFs-Mac-Pro.local on Mac OS X 10.6.8 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_26-b03-384-10M3425
Snappy is disabled via system property.
INFO 2011-10-07 15:49:56 SortSam Read 10000000 records.
INFO 2011-10-07 15:51:13 SortSam Read 20000000 records.
INFO 2011-10-07 15:52:29 SortSam Read 30000000 records.
INFO 2011-10-07 15:53:44 SortSam Read 40000000 records.
INFO 2011-10-07 15:54:59 SortSam Read 50000000 records.
INFO 2011-10-07 15:55:38 SortSam Finished reading inputs, merging and writing to output now.
[Fri Oct 07 16:04:53 BST 2011] net.sf.picard.sam.SortSam done. Elapsed time: 16.31 minutes.
Runtime.totalMemory()=830885888
java -Xmx2g -Dsnappy.disable=true -jar /path/to/picard-tools-1.52/SortSam.jar INPUT=tophat.bam OUTPUT=tophat_picardsort.bam SORT_ORDER=coordinate
[Fri Oct 07 16:07:29 BST 2011] net.sf.picard.sam.SortSam INPUT=tophat.bam OUTPUT=tophat_picardsort.bam SORT_ORDER=coordinate VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Fri Oct 07 16:07:29 BST 2011] Executing as swebioinf@SWEBIOINFs-Mac-Pro.local on Mac OS X 10.6.8 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_26-b03-384-10M3425
Snappy is disabled via system property.
INFO 2011-10-07 16:07:40 SortSam Finished reading inputs, merging and writing to output now.
[Fri Oct 07 16:07:54 BST 2011] net.sf.picard.sam.SortSam done. Elapsed time: 0.40 minutes.
Runtime.totalMemory()=527294464
java -Xmx2g -Dsnappy.disable=true -jar /path/to/picard-tools-1.52/MergeSamFiles.jar INPUT=bowtie_picardsort.bam INPUT=tophat_picardsort.bam OUTPUT=picardall.bam SORT_ORDER=coordinate
[Fri Oct 07 16:10:02 BST 2011] net.sf.picard.sam.MergeSamFiles INPUT=[bowtie_picardsort.bam, tophat_picardsort.bam] OUTPUT=picardall.bam SORT_ORDER=coordinate ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=false USE_THREADING=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Fri Oct 07 16:10:02 BST 2011] Executing as swebioinf@SWEBIOINFs-Mac-Pro.local on Mac OS X 10.6.8 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_26-b03-384-10M3425
INFO 2011-10-07 16:10:02 MergeSamFiles Input files are in same order as output so sorting to temp directory is not needed.
[Fri Oct 07 16:10:02 BST 2011] net.sf.picard.sam.MergeSamFiles done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=85000192
Exception in thread "main" net.sf.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 1 don't match: 1/243199373/chr2 1/135534747/chr10
at net.sf.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:109)
at net.sf.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:138)
at net.sf.picard.sam.SamFileHeaderMerger.getSequenceDictionary(SamFileHeaderMerger.java:482)
at net.sf.picard.sam.SamFileHeaderMerger.<init>(SamFileHeaderMerger.java:133)
at net.sf.picard.sam.MergeSamFiles.doWork(MergeSamFiles.java:137)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175)
at net.sf.picard.sam.MergeSamFiles.main(MergeSamFiles.java:84)
Exception in thread "main" net.sf.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 1 dont match: 1/243199373/chr2 1/135534747/chr10
I think the problem lies somewhere with the way bowtie and tophat sort their output by default. Bowtie sorts numerically, chr1, chr2, chr3... etc whereas TopHat sorts chr1, chr10, chr11......chr19, chr2, chr20, chr21 etc. But we have sorted both files again using picard SortSam so we cannot understand why this discrepancy is not corrected. (Same happens with samtools sort).
Can anyone help us please?
Many thanks
Helen
Comment