Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat & samtools: accepted_hits0_sorted.0000.bam

    I am encountering an error when running tophat v2.0.8b with samtools 0.1.19+. At the end of a run, during tophat_reports, I get the following error:
    Code:
    Error: [Errno 2] No such file or directory: 'EGFR-tophat/tmp/accepted_hits0_sorted.bam'
    The directory 'EGFR-tophat/tmp' has a bunch of files that look like this:

    accepted_hits0_sorted.0000.bam
    accepted_hits0_sorted.0001.bam
    accepted_hits0_sorted.0002.bam
    ..
    accepted_hits7_sorted.0004.bam
    accepted_hits7_sorted.0005.bam

    On checking 'run.log', it appears that these files are being created with samtools, invoked like this:

    Code:
    /usr/local/bin/samtools sort EGFR-tophat/tmp/accepted_hits0.bam EGFR-tophat/tmp/accepted_hits0_sorted
    And the error is being thrown by samtools, invode like this:
    Code:
    /usr/local/bin/samtools merge -f -h EGFR-tophat/tmp/Homo_sapiens_GRCh37_71_genome.bwt.samheader.sam EGFR-tophat/accepted_hits.bam EGFR-tophat/tmp/accepted_hits0_sorted.bam EGFR-tophat/tmp/accepted_hits1_sorted.bam EGFR-tophat/tmp/accepted_hits2_sorted.bam EGFR-tophat/tmp/accepted_hits3_sorted.bam EGFR-tophat/tmp/accepted_hits4_sorted.bam EGFR-tophat/tmp/accepted_hits5_sorted.bam EGFR-tophat/tmp/accepted_hits6_sorted.bam EGFR-tophat/tmp/accepted_hits7_sorted.bam

    Checking the samtools manual, I see:
    Code:
    This command may also create temporary files <out.prefix>.%d.bam when  the  whole  alignment  cannot  be fitted into memory (controlled by option -m).
    Apparently tophat is not aware that samtools may segment the alignments, and fails to properly merge them. Is there a way to direct tophat to do this? Should I just use --no-sort-bam with tophat? What are the implications of an unsorted bam?

    Meanwhile, can I merge these semented alignments myself and resume tophat? Something like:
    Code:
    samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    and then resume tophat with -R? Since memory was a limit during the sort, will I run out of memory during the merge? Is there a reason samtools doesn't automatically merge the segmented alignments?

  • #2
    Originally posted by esiefker View Post

    Meanwhile, can I merge these semented alignments myself and resume tophat? Something like:
    Code:
    samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    and then resume tophat with -R?
    I tried this, it complained about truncated files.

    [ebs15242@soresearch tmp]$ samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Segmentation fault
    [ebs15242@soresearch tmp]$ du -hsc !$
    du -hsc accepted_hits0_sorted.*.bam
    228M accepted_hits0_sorted.0000.bam
    226M accepted_hits0_sorted.0001.bam
    177M accepted_hits0_sorted.0002.bam
    236K accepted_hits0_sorted.0003.bam
    16K accepted_hits0_sorted.0004.bam
    0 accepted_hits0_sorted.0005.bam
    0 accepted_hits0_sorted.0006.bam
    630M total
    Why is samtools outputting truncated files? Should I pass it the -m flag? How do I instruct tophat to do that?

    Comment


    • #3
      Oh, never mind. I was running out of disk space. I was not aware, because tophat removed some temporary files after samtools failed. Making space has gotten me past this error. Hope this experience can help someone else...

      Comment


      • #4
        Dear esiefker,

        I've just encountered the same error, your post was indeed a big help for me. Thanks!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-14-2024, 07:03 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        37 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Working...
        X