Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat & samtools: accepted_hits0_sorted.0000.bam

    I am encountering an error when running tophat v2.0.8b with samtools 0.1.19+. At the end of a run, during tophat_reports, I get the following error:
    Code:
    Error: [Errno 2] No such file or directory: 'EGFR-tophat/tmp/accepted_hits0_sorted.bam'
    The directory 'EGFR-tophat/tmp' has a bunch of files that look like this:

    accepted_hits0_sorted.0000.bam
    accepted_hits0_sorted.0001.bam
    accepted_hits0_sorted.0002.bam
    ..
    accepted_hits7_sorted.0004.bam
    accepted_hits7_sorted.0005.bam

    On checking 'run.log', it appears that these files are being created with samtools, invoked like this:

    Code:
    /usr/local/bin/samtools sort EGFR-tophat/tmp/accepted_hits0.bam EGFR-tophat/tmp/accepted_hits0_sorted
    And the error is being thrown by samtools, invode like this:
    Code:
    /usr/local/bin/samtools merge -f -h EGFR-tophat/tmp/Homo_sapiens_GRCh37_71_genome.bwt.samheader.sam EGFR-tophat/accepted_hits.bam EGFR-tophat/tmp/accepted_hits0_sorted.bam EGFR-tophat/tmp/accepted_hits1_sorted.bam EGFR-tophat/tmp/accepted_hits2_sorted.bam EGFR-tophat/tmp/accepted_hits3_sorted.bam EGFR-tophat/tmp/accepted_hits4_sorted.bam EGFR-tophat/tmp/accepted_hits5_sorted.bam EGFR-tophat/tmp/accepted_hits6_sorted.bam EGFR-tophat/tmp/accepted_hits7_sorted.bam

    Checking the samtools manual, I see:
    Code:
    This command may also create temporary files <out.prefix>.%d.bam when  the  whole  alignment  cannot  be fitted into memory (controlled by option -m).
    Apparently tophat is not aware that samtools may segment the alignments, and fails to properly merge them. Is there a way to direct tophat to do this? Should I just use --no-sort-bam with tophat? What are the implications of an unsorted bam?

    Meanwhile, can I merge these semented alignments myself and resume tophat? Something like:
    Code:
    samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    and then resume tophat with -R? Since memory was a limit during the sort, will I run out of memory during the merge? Is there a reason samtools doesn't automatically merge the segmented alignments?

  • #2
    Originally posted by esiefker View Post

    Meanwhile, can I merge these semented alignments myself and resume tophat? Something like:
    Code:
    samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    and then resume tophat with -R?
    I tried this, it complained about truncated files.

    [ebs15242@soresearch tmp]$ samtools merge accepted_hits0_sorted.bam accepted_hits0_sorted.*.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Segmentation fault
    [ebs15242@soresearch tmp]$ du -hsc !$
    du -hsc accepted_hits0_sorted.*.bam
    228M accepted_hits0_sorted.0000.bam
    226M accepted_hits0_sorted.0001.bam
    177M accepted_hits0_sorted.0002.bam
    236K accepted_hits0_sorted.0003.bam
    16K accepted_hits0_sorted.0004.bam
    0 accepted_hits0_sorted.0005.bam
    0 accepted_hits0_sorted.0006.bam
    630M total
    Why is samtools outputting truncated files? Should I pass it the -m flag? How do I instruct tophat to do that?

    Comment


    • #3
      Oh, never mind. I was running out of disk space. I was not aware, because tophat removed some temporary files after samtools failed. Making space has gotten me past this error. Hope this experience can help someone else...

      Comment


      • #4
        Dear esiefker,

        I've just encountered the same error, your post was indeed a big help for me. Thanks!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Choosing Between NGS and qPCR
          by seqadmin



          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
          10-18-2024, 07:11 AM
        • seqadmin
          Non-Coding RNA Research and Technologies
          by seqadmin




          Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

          Nobel Prize for MicroRNA Discovery
          This week,...
          10-07-2024, 08:07 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 05:31 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-24-2024, 06:58 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-23-2024, 08:43 AM
        0 responses
        48 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-17-2024, 07:29 AM
        0 responses
        58 views
        0 likes
        Last Post seqadmin  
        Working...
        X