Hi,
We recently received sequencing data in the form of BAM files that are ~5.5x larger than what we normally deal with after upgrading our sequencer. I have had to submit Picard commands like MarkDuplicates, AddOrReplaceReadGroups etc. by submitting jobs in my terminal. Since doing this, I have been receiving premature EOF errors. As for a suggestion I found a while ago, I used the command $ tail problem.bam | hexdump -C to view the EOF marker and found that it was present although not at the end of the file where it normally has been when I have run earlier data sets. I have pasted what I see here:
000005f0 f8 f6 25 bc 3a 51 28 f1 64 6a 98 38 25 f9 1f 3c |..%.:Q(.dj.8%..<|
00000600 29 71 3b 11 61 00 00 1f 8b 08 04 00 00 00 00 00 |)q;.a...........|
00000610 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 |...BC...........|
00000620 00 00 00 5b 57 65 64 20 4e 6f 76 20 30 37 20 31 |...[Wed Nov 07 1|
00000630 36 3a 31 31 3a 32 38 20 45 53 54 20 32 30 31 32 |6:11:28 EST 2012|
00000640 5d 20 6e 65 74 2e 73 66 2e 70 69 63 61 72 64 2e |] net.sf.picard.|
00000650 73 61 6d 2e 41 64 64 4f 72 52 65 70 6c 61 63 65 |sam.AddOrReplace|
00000660 52 65 61 64 47 72 6f 75 70 73 20 64 6f 6e 65 2e |ReadGroups done.|
00000670 20 45 6c 61 70 73 65 64 20 74 69 6d 65 3a 20 32 | Elapsed time: 2|
00000680 36 2e 30 36 20 6d 69 6e 75 74 65 73 2e 0a 52 75 |6.06 minutes..Ru|
00000690 6e 74 69 6d 65 2e 74 6f 74 61 6c 4d 65 6d 6f 72 |ntime.totalMemor|
000006a0 79 28 29 3d 35 35 39 32 31 38 36 38 38 0a |y()=559218688.|
000006ae
I have bolded the 28 byte EOF marker. It should be at the end of the file but it is not. I am wondering if the information being added to the end of the file is the summary that Picard writes that got put to the file when I ran this command as a job. Does anyone have an idea of what is going on? Thank you so much! Here is the command I am using.
/opt/sharcnet/sq-tm/2.4/bin/sqsub -o sorted-lane1.marked.bam --memperproc=20G -r 7d \
> java -jar ./MarkDuplicates.jar INPUT=sorted-lane2.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT
We recently received sequencing data in the form of BAM files that are ~5.5x larger than what we normally deal with after upgrading our sequencer. I have had to submit Picard commands like MarkDuplicates, AddOrReplaceReadGroups etc. by submitting jobs in my terminal. Since doing this, I have been receiving premature EOF errors. As for a suggestion I found a while ago, I used the command $ tail problem.bam | hexdump -C to view the EOF marker and found that it was present although not at the end of the file where it normally has been when I have run earlier data sets. I have pasted what I see here:
000005f0 f8 f6 25 bc 3a 51 28 f1 64 6a 98 38 25 f9 1f 3c |..%.:Q(.dj.8%..<|
00000600 29 71 3b 11 61 00 00 1f 8b 08 04 00 00 00 00 00 |)q;.a...........|
00000610 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 |...BC...........|
00000620 00 00 00 5b 57 65 64 20 4e 6f 76 20 30 37 20 31 |...[Wed Nov 07 1|
00000630 36 3a 31 31 3a 32 38 20 45 53 54 20 32 30 31 32 |6:11:28 EST 2012|
00000640 5d 20 6e 65 74 2e 73 66 2e 70 69 63 61 72 64 2e |] net.sf.picard.|
00000650 73 61 6d 2e 41 64 64 4f 72 52 65 70 6c 61 63 65 |sam.AddOrReplace|
00000660 52 65 61 64 47 72 6f 75 70 73 20 64 6f 6e 65 2e |ReadGroups done.|
00000670 20 45 6c 61 70 73 65 64 20 74 69 6d 65 3a 20 32 | Elapsed time: 2|
00000680 36 2e 30 36 20 6d 69 6e 75 74 65 73 2e 0a 52 75 |6.06 minutes..Ru|
00000690 6e 74 69 6d 65 2e 74 6f 74 61 6c 4d 65 6d 6f 72 |ntime.totalMemor|
000006a0 79 28 29 3d 35 35 39 32 31 38 36 38 38 0a |y()=559218688.|
000006ae
I have bolded the 28 byte EOF marker. It should be at the end of the file but it is not. I am wondering if the information being added to the end of the file is the summary that Picard writes that got put to the file when I ran this command as a job. Does anyone have an idea of what is going on? Thank you so much! Here is the command I am using.
/opt/sharcnet/sq-tm/2.4/bin/sqsub -o sorted-lane1.marked.bam --memperproc=20G -r 7d \
> java -jar ./MarkDuplicates.jar INPUT=sorted-lane2.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT
Comment