Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to correct BAM file EOF marker absent error

    Hi,
    I am trying to run bamUtils's splitBam program to split my bam file into read groups but getting the error: BGZF EOF marker is missing.
    Samtools also gives the similar error "EOF marker absent. I understand that my file is not corrupt but just an older BAM/SAM version. Is there anyway to fix these errors so that bamUtils can be run on it.

  • #2
    Try using 'samtools view' BAM to BAM, or if you can program just append the 28 byte EOF marker. It might be possible to do that with Unix cat, if you can work out how to escape the binary values...

    Comment


    • #3
      Does the "no EOF" error actually stop the SAMtools command from functioning properly? If not, I'd say just leave it be.

      Comment


      • #4
        Originally posted by swbarnes2 View Post
        Does the "no EOF" error actually stop the SAMtools command from functioning properly? If not, I'd say just leave it be.
        The warning from samtools is harmless (but scary - and if you get in the habit of ignoring it you may miss a truly truncated file).

        I don't know if bamUtils is less tolerant and simply aborts.

        Comment


        • #5
          Thanks maub for the replies.
          bamUtils simply aborts the program.. which is somewhat annoying. I have large bam files from whole genome. Now converting them to newer version with samtools takes hours.
          Both Samtools and Picard can easily process "EOF absent" files.

          Comment


          • #6
            You should probably contact the bamUtils authors to treat this as a warning not an error.

            If you are going to use the 'samtools view' trick to 'fix' the BAM files, can you pipe this directly into bamUtils? That should be faster (less disk IO), and if you select BAM with no compression even faster still.

            The trick of appending the missing 28 bytes EOF is probably the most efficient solution, as long as you don't mind risking editing the BAM file in situ. You could try this little Python script I just wrote - but please backup your data first and let me know if it works nicely - it has had only minimal testing: https://github.com/peterjc/picobio/b...gzf_add_eof.py
            Last edited by maubp; 04-16-2012, 01:15 PM. Reason: Link to script

            Comment


            • #7
              I am trying to get in touch with the bamUtils authors. Thanks for the trick about piping samtools output to bamUtils and for the script as well.

              Comment


              • #8
                Hi maubp,
                I used your script and its seems to run without any error but takes several hours before it finishes. Could it be due to the fact that my bam files are in 100gb size?

                Comment


                • #9
                  Several hours? Wow. I haven't tried it on any files that large but that does surprise me. I can think of a few variations that might help (e.g. opening in append mode to add the block). I may have time to look at this tomorrow...

                  Comment


                  • #10
                    Its all right.. no rush. I've got reply from bamUtil's developers and I think the problem will be fixed. Thanks a lot for the help.

                    Comment


                    • #11
                      OK, yes, I could reproduce the slowness with bigger files (and found it didn't seem to work as expected on Mac OS 10.6 which I'd not tried before).

                      A slight difference to use append mode explicitly worked wonders - it should now be sub-second as I originally intended. Could you retry (again, backup the data first - this does edit in place and is new code):


                      Note this is now v0.0.1 (the original you tried I have dubbed v0.0.0).

                      Comment


                      • #12
                        I'll give it a shot and let you know. Actually I forgot to mention that I was using Mac Osx as well.

                        Thanks.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        51 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X