Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • hanshart
    Member
    • Nov 2011
    • 27

    samtools sort

    Hi,

    I just installed the latest samtools (0.1.19-44428cd) and now I have an issue with my SAM->BAM->BAM_Sorted Pipeline using the Linux pipe. In samtools version 0.1.18 (r982:295) the following always worked well:
    Code:
    samtools view -bS -1 temp.sam | samtools sort - temp_sorted
    But with the new version I always get the following error:
    Code:
    [bam_header_read] EOF marker is absent. The input is probably truncated
    I also ran the pipeline with version 0.1.18 to check whether the resulting sorted bam files are the same (regardless of the error message). Linux diff command said no. So my first question: Is the error message problematic?

    After some testing I realized that there is even a difference (also for version 0.1.18) between a sorted bam that was build with the pipe (like in the command above) or that was build without the pipe via:
    Code:
    samtools sort temp.bam temp_sorted
    So my second question is whether anyone knows the difference and if this can be problematic too?

    Sorry, this part was wrong, I made a stupid mistake. The pipe sorting and and direct way of sorting gives the same result!

    As the error is not reported in the non-pipeline version, and the resulting file is the same as that of the pipeline version, the error message in version 0.1.19 is negligible. The only question remaining now is the difference in the 0.1.18-sorted file and the 0.1.19-sorted file
    :
    Code:
    diff <(samtools view temp_sorted_pipe.bam) <(samtools view temp_sorted_old_pipe.bam) | head -20
    466a467
    > DJG6PNM1:223:D1GB7ACXX:2:1101:19332:59581     16      gi|555853|gb|U13369.1|HSU13369  3657    255     19M     *       0       0       TACCTGGTTGATCCTGCCA     HHIIIIHHHHHFFFFFCCC   XA:i:0  MD:Z:19 NM:i:0
    474,475d474
    < DJG6PNM1:223:D1GB7ACXX:2:1101:19332:59581     16      gi|555853|gb|U13369.1|HSU13369  3657    255     19M     *       0       0       TACCTGGTTGATCCTGCCA     HHIIIIHHHHHFFFFFCCC   XA:i:0  MD:Z:19 NM:i:0
    < DJG6PNM1:223:D1GB7ACXX:2:1101:14750:15107     0       gi|555853|gb|U13369.1|HSU13369  3660    255     16M     *       0       0       CTGGTTGATCCTGCCA        BCCFDFFFHHHHGIII      XA:i:0  MD:Z:16 NM:i:0
    477c476
    < DJG6PNM1:223:D1GB7ACXX:2:1101:15030:64473     0       gi|555853|gb|U13369.1|HSU13369  3661    255     26M     *       0       0       TGGTTGATCCTGCCAGTAGCATATGC      4114=?BDHHHGHIIIIIIIIIEIHI    XA:i:0  MD:Z:26 NM:i:0
    ---
    > DJG6PNM1:223:D1GB7ACXX:2:1101:14750:15107     0       gi|555853|gb|U13369.1|HSU13369  3660    255     16M     *       0       0       CTGGTTGATCCTGCCA        BCCFDFFFHHHHGIII      XA:i:0  MD:Z:16 NM:i:0
    478a478
    > DJG6PNM1:223:D1GB7ACXX:2:1101:15030:64473     0       gi|555853|gb|U13369.1|HSU13369  3661    255     26M     *       0       0       TGGTTGATCCTGCCAGTAGCATATGC      4114=?BDHHHGHIIIIIIIIIEIHI    XA:i:0  MD:Z:26 NM:i:0
    492d491
    < DJG6PNM1:223:D1GB7ACXX:2:1101:5749:82660      0       gi|555853|gb|U13369.1|HSU13369  3669    255     29M     *       0       0       CCTGCCAGTAGCATATGCTTGTCTCAAAG   CCCFFFFFHHHHHIIIIIIIIIIIIIIII XA:i:0  MD:Z:29 NM:i:0
    495,497c494
    < DJG6PNM1:223:D1GB7ACXX:2:1101:17420:15616     0       gi|555853|gb|U13369.1|HSU13369  3670    255     23M     *       0       0       CTGCCAGTAGCATATGCTTGTCT CCCFFFFFHHHHHIIIIIIIIII       XA:i:0  MD:Z:23 NM:i:0
    < DJG6PNM1:223:D1GB7ACXX:2:1101:6026:70596      0       gi|555853|gb|U13369.1|HSU13369  3670    255     23M     *       0       0       CTGCCAGTAGCATATGCTTGTCT CCCFFFFFHHHHHIIIIIIIIII       XA:i:0  MD:Z:23 NM:i:0
    < DJG6PNM1:223:D1GB7ACXX:2:1102:15933:7414      0       gi|555853|gb|U13369.1|HSU13369  3670    255     22M     *       0       0       CTGCCAGTAGCATATGCTTGTC  BCCFFFFFHHHHHIIIIIIIII        XA:i:0  MD:Z:22 NM:i:0
    ---
    > DJG6PNM1:223:D1GB7ACXX:2:1101:5749:82660      0       gi|555853|gb|U13369.1|HSU13369  3669    255     29M     *       0       0       CCTGCCAGTAGCATATGCTTGTCTCAAAG   CCCFFFFFHHHHHIIIIIIIIIIIIIIII XA:i:0  MD:Z:29 NM:i:0
    498a496
    temp_sorted_old_pipe.bam was build using the old samtools version (0.1.18)
    Thank you very much
    Last edited by hanshart; 06-29-2013, 12:57 PM.
  • Heisman
    Senior Member
    • Dec 2010
    • 534

    #2
    I've always assumed this is not a real issue but maybe it is. You said you noticed a difference between the two methods of sorting. What difference did you notice? Meaning, were the output files different or was the only difference whether or not you got that error message?

    Comment

    • hanshart
      Member
      • Nov 2011
      • 27

      #3
      Originally posted by Heisman View Post
      I've always assumed this is not a real issue but maybe it is. You said you noticed a difference between the two methods of sorting. What difference did you notice? Meaning, were the output files different or was the only difference whether or not you got that error message?
      Thank you for your answer Heisman,
      actually I was wrong.
      There is no difference in the way of sorting (either with or without the pipe). Sorry for the confusion, I edited my first post.

      The difference between the different versions is however true. I attached the first part of the Linux "diff" output but I'm not sure if this is really helpful. So, in which way the sorting has changed? Is it important for any issues?
      Thanks again

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        Originally posted by hanshart View Post
        Hi,

        I just installed the latest samtools (0.1.19-44428cd) and now I have an issue with my SAM->BAM->BAM_Sorted Pipeline using the Linux pipe. In samtools version 0.1.18 (r982:295) the following always worked well:
        Code:
        samtools view -bS -1 temp.sam | samtools sort - temp_sorted
        But with the new version I always get the following error:
        Code:
        [bam_header_read] EOF marker is absent. The input is probably truncated
        I also ran the pipeline with version 0.1.18 to check whether the resulting sorted bam files are the same (regardless of the error message). Linux diff command said no. So my first question: Is the error message problematic?
        This is a known bug in samtools 0.1.19,
        Original title: "samtools sort from stdin shouldn't check BAM EOF" Consider this simplified example where I want to sort a BAM file supplied on stdin, $ cat test.bam | samtools sort - test_sorted [...


        The warning is in this case probably harmless - but in general can be a sign of a truncated file related problem.

        Comment

        • hanshart
          Member
          • Nov 2011
          • 27

          #5
          Originally posted by maubp View Post
          This is a known bug in samtools 0.1.19,
          Original title: "samtools sort from stdin shouldn't check BAM EOF" Consider this simplified example where I want to sort a BAM file supplied on stdin, $ cat test.bam | samtools sort - test_sorted [...


          The warning is in this case probably harmless - but in general can be a sign of a truncated file related problem.
          Thank you maubp.
          About the different sorting in version 0.1.19 in contrast to version 0.1.18:
          I'm quite sure that in version 0.1.19 reads beginning at the same position are now sorted by strand (first forward, than reverse strand) whereas in version 0.1.18 they were not sorted by strand:

          Code:
           diff <(samtools view temp_sorted_pipe.bam | cut -f2,4) <(samtools view temp_sorted_old_pipe.bam | cut -f2,4) -y | less -S
          ...
          0       3709                   0       3709
          0       3709                 <
          16      3709                   16      3709
          16      3709                   16      3709
          16      3709                   16      3709
          16      3709                   16      3709
                                       > 0       3709
                                       > 16      3710
          0       3710                   0       3710
          0       3710                   0       3710
          0       3710                   0       3710
          16      3710                 | 16      3711
          0       3711                   0       3711
          0       3711                   0       3711
          16      3711                   16      3711
          16      3711                 <
          0       3712                   0       3712
          0       3713                   0       3713
                                       > 16      3713
                                       > 16      3713
          0       3713                   0       3713
          0       3713                   0       3713
          0       3713                   0       3713
          16      3713                 <
          16      3713                 <
          0       3714                 <
          0       3714                   0       3714
          0       3714                   0       3714
          ...
          On the left (version 0.1.19) the reads are sorted by position and strand whereas on the right (version 0.1.18) they are only sorted by position

          Am I right?
          Thanks

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM
          • SEQadmin2
            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
            by SEQadmin2


            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


            Introduction

            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
            05-22-2026, 06:42 AM
          • SEQadmin2
            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
            by SEQadmin2

            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
            05-06-2026, 09:04 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Today, 08:59 AM
          0 responses
          7 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          21 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-28-2026, 11:40 AM
          0 responses
          29 views
          0 reactions
          Last Post SEQadmin2  
          Working...