Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Premature EOF using Picard

    Hi,

    We recently received sequencing data in the form of BAM files that are ~5.5x larger than what we normally deal with after upgrading our sequencer. I have had to submit Picard commands like MarkDuplicates, AddOrReplaceReadGroups etc. by submitting jobs in my terminal. Since doing this, I have been receiving premature EOF errors. As for a suggestion I found a while ago, I used the command $ tail problem.bam | hexdump -C to view the EOF marker and found that it was present although not at the end of the file where it normally has been when I have run earlier data sets. I have pasted what I see here:

    000005f0 f8 f6 25 bc 3a 51 28 f1 64 6a 98 38 25 f9 1f 3c |..%.:Q(.dj.8%..<|
    00000600 29 71 3b 11 61 00 00 1f 8b 08 04 00 00 00 00 00 |)q;.a...........|
    00000610 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 |...BC...........|
    00000620 00 00 00 5b 57 65 64 20 4e 6f 76 20 30 37 20 31 |...[Wed Nov 07 1|
    00000630 36 3a 31 31 3a 32 38 20 45 53 54 20 32 30 31 32 |6:11:28 EST 2012|
    00000640 5d 20 6e 65 74 2e 73 66 2e 70 69 63 61 72 64 2e |] net.sf.picard.|
    00000650 73 61 6d 2e 41 64 64 4f 72 52 65 70 6c 61 63 65 |sam.AddOrReplace|
    00000660 52 65 61 64 47 72 6f 75 70 73 20 64 6f 6e 65 2e |ReadGroups done.|
    00000670 20 45 6c 61 70 73 65 64 20 74 69 6d 65 3a 20 32 | Elapsed time: 2|
    00000680 36 2e 30 36 20 6d 69 6e 75 74 65 73 2e 0a 52 75 |6.06 minutes..Ru|
    00000690 6e 74 69 6d 65 2e 74 6f 74 61 6c 4d 65 6d 6f 72 |ntime.totalMemor|
    000006a0 79 28 29 3d 35 35 39 32 31 38 36 38 38 0a |y()=559218688.|
    000006ae

    I have bolded the 28 byte EOF marker. It should be at the end of the file but it is not. I am wondering if the information being added to the end of the file is the summary that Picard writes that got put to the file when I ran this command as a job. Does anyone have an idea of what is going on? Thank you so much! Here is the command I am using.

    /opt/sharcnet/sq-tm/2.4/bin/sqsub -o sorted-lane1.marked.bam --memperproc=20G -r 7d \
    > java -jar ./MarkDuplicates.jar INPUT=sorted-lane2.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT
    Last edited by biochemMScstudent; 11-09-2012, 07:06 AM. Reason: typo

  • #2
    Somehow some debug information has ended up at the end of your BAM file. One possible cause if the debugging was written to stdout (which was piped to the BAM file) instead of stderr. Or, it could just be a Picard bug where the debugging has wrongly been written to the output file. But I am just guessing here.

    I'd ask the Picard developers about this if I were you...

    Comment


    • #3
      OK thanks very much. I was able to run the command without submitting a job so I did get the data I need, but in the future it my files are even larger I may have no choice but to submit Picard commands using jobs and not through command line so I will look into notifying Picard developers about this. Thanks again!

      Comment


      • #4
        Hmm. Perhaps the qsub did something unexpected with the stdout/stderr pipes. You could try a simple wrapper script (qsub the shell script, the shell script calls Picard). That may solve it.
        Last edited by maubp; 11-12-2012, 07:02 AM. Reason: markup

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X