Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat & samtools: accepted_hits0_sorted.0000.bam

    I am encountering an error when running tophat v2.0.8b with samtools 0.1.19+. At the end of a run, during tophat_reports, I get the following error:
    Code:
    Error: [Errno 2] No such file or directory: 'EGFR-tophat/tmp/accepted_hits0_sorted.bam'
    The directory 'EGFR-tophat/tmp' has a bunch of files that look like this:

    accepted_hits0_sorted.0000.bam
    accepted_hits0_sorted.0001.bam
    accepted_hits0_sorted.0002.bam
    ..
    accepted_hits7_sorted.0004.bam
    accepted_hits7_sorted.0005.bam

    On checking 'run.log', it appears that these files are being created with samtools, invoked like this:

    Code:
    /usr/local/bin/samtools sort EGFR-tophat/tmp/accepted_hits0.bam EGFR-tophat/tmp/accepted_hits0_sorted
    And the error is being thrown by samtools, invode like this:
    Code:
    /usr/local/bin/samtools merge -f -h EGFR-tophat/tmp/Homo_sapiens_GRCh37_71_genome.bwt.samheader.sam EGFR-tophat/accepted_hits.bam EGFR-tophat/tmp/accepted_hits0_sorted.bam EGFR-tophat/tmp/accepted_hits1_sorted.bam EGFR-tophat/tmp/accepted_hits2_sorted.bam EGFR-tophat/tmp/accepted_hits3_sorted.bam EGFR-tophat/tmp/accepted_hits4_sorted.bam EGFR-tophat/tmp/accepted_hits5_sorted.bam EGFR-tophat/tmp/accepted_hits6_sorted.bam EGFR-tophat/tmp/accepted_hits7_sorted.bam

    Checking the samtools manual, I see:
    Code:
    This command may also create temporary files <out.prefix>.%d.bam when  the  whole  alignment  cannot  be fitted into memory (controlled by option -m).
    Apparently tophat is not aware that samtools may segment the alignments, and fails to properly merge them. Is there a way to direct tophat to do this? Should I just use --no-sort-bam with tophat? What are the implications of an unsorted bam?

    Meanwhile, can I merge these semented alignments myself and resume tophat? Something like:
    Code:
    samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    and then resume tophat with -R? Since memory was a limit during the sort, will I run out of memory during the merge? Is there a reason samtools doesn't automatically merge the segmented alignments?

  • #2
    Originally posted by esiefker View Post

    Meanwhile, can I merge these semented alignments myself and resume tophat? Something like:
    Code:
    samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    and then resume tophat with -R?
    I tried this, it complained about truncated files.

    [ebs15242@soresearch tmp]$ samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Segmentation fault
    [ebs15242@soresearch tmp]$ du -hsc !$
    du -hsc accepted_hits0_sorted.*.bam
    228M accepted_hits0_sorted.0000.bam
    226M accepted_hits0_sorted.0001.bam
    177M accepted_hits0_sorted.0002.bam
    236K accepted_hits0_sorted.0003.bam
    16K accepted_hits0_sorted.0004.bam
    0 accepted_hits0_sorted.0005.bam
    0 accepted_hits0_sorted.0006.bam
    630M total
    Why is samtools outputting truncated files? Should I pass it the -m flag? How do I instruct tophat to do that?

    Comment


    • #3
      Oh, never mind. I was running out of disk space. I was not aware, because tophat removed some temporary files after samtools failed. Making space has gotten me past this error. Hope this experience can help someone else...

      Comment


      • #4
        Dear esiefker,

        I've just encountered the same error, your post was indeed a big help for me. Thanks!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 06-07-2024, 06:58 AM
        0 responses
        176 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:18 AM
        0 responses
        211 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:04 AM
        0 responses
        179 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-03-2024, 06:55 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Working...
        X