Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat error: disk full

    Hi All
    I am analyzing one PE lane with read files 's_3_1_sequence.txt’ and 's_3_2_sequence.txt’;
    here are the first lines the read files:

    s_3_1_sequence.txt
    @GAII:3:1:2:321#0/1
    GGGGCCTGGGACTCTNGGTCCCCTACTGNAGACA
    +GAII:3:1:2:321#0/1
    `[`aaX`_aV`aaaZDTKT\X__^XGZZDVV``a
    @GAII:3:1:2:314#0/1
    CCACCAGGCGCCCGTNGTGGCGCAGGAANGGGTG
    +GAII:3:1:2:314#0/1
    _``aa_\\_\_aa_PDZVYZ\ZZPZ\TVDHZT\Z
    @GAII:3:1:2:508#0/1
    GTTCAGCAGGAATGCNGAGATCGGAAGANGGGTT

    s_3_2_sequence.txt
    @GAII:3:1:2:321#0/2
    TCCCNCCTGCCCNNNGCTTCNNNGTTTTNNNTCA
    +GAII:3:1:2:321#0/2
    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    @GAII:3:1:2:314#0/2
    CAGTNCCAGCGCNNNAGCGTNNNGACCTNNNACC
    +GAII:3:1:2:314#0/2
    `_JJDZ_aBBBBBBBBBBBBBBBBBBBBBBBBBB
    @GAII:3:1:2:508#0/2
    TCATNCCTGCTTANNCTATANNNTAAGAGNNTCT
    M1-80330:reads jdhahbi$

    the command-line I used:
    tophat -r 200 /mydir/bowtie-0.9.9.3/indexes/h_sapiens_asm s_3_1_sequence.txt s_3_2_sequence.txt

    the output with the error is below; I checked the disk space and there are more than 100 GB available:

    [Thu May 21 16:53:57 2009] Beginning TopHat run (v1.0.7)
    -----------------------------------------------
    [Thu May 21 16:53:57 2009] Preparing output location ./tophat_out/
    [Thu May 21 16:53:57 2009] Checking for Bowtie index files
    [Thu May 21 16:53:57 2009] Checking for reference FASTA file
    [Thu May 21 16:53:57 2009] Checking for Bowtie
    Bowtie version: 0.9.9.3
    [Thu May 21 16:53:58 2009] Checking reads
    seed length: 34bp
    format: fastq
    quality scale: phred
    Splitting reads into 1 segments
    [Thu May 21 17:00:49 2009] Mapping reads against h_sapiens_asm with Bowtie
    Splitting reads into 1 segments
    [Thu May 21 18:03:09 2009] Mapping reads against h_sapiens_asm with Bowtie
    [Thu May 21 18:51:52 2009] Searching for junctions via coverage islands
    [Thu May 21 18:59:12 2009] Searching for junctions via mate-pair closures
    [Fri May 22 05:40:00 2009] Retrieving sequences for splices
    [Fri May 22 05:48:53 2009] Indexing splices
    Index is corrupt: File size for ./tophat_out/tmp/segment_juncs.1.ebwt should have been 3799224901 but is actually -495742395.
    Please check if there is a problem with the disk or if disk is full.
    [FAILED]
    Error: Splice sequence indexing failed

    Any suggestions are appreciated,
    Thanks,

    joseph

  • #2
    Hi Joseph,

    This is most likely the same bug as a few other users have reported, where with short, paired reads, it's possible for the splice index to become unreasonably large and that may be tripping Bowtie's index integrity checks. I have fixed this in my source tree, and the new version should be released next week. I am just tying up a few loose ends with the latest build.

    Sorry for the inconvenience. If you'd like to test out a snapshot of the code to see if it resolves the problem for you, please email me directly.

    Comment


    • #3
      Hi Cole,

      I had a disk quote problem too. TopHat produced: >1.5TB!

      I use v1.1.4 with the default setting to map ~1M SE total RNA reads which are 20-37 nt long. The huge file is produced by long_spanning_reads after junction mapping step.

      Any ideas?
      Thanks,

      Biter

      Comment


      • #4
        This is a bug due to variable read length, which we fixed, the next version will include the fix.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          Yesterday, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:18 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 08:04 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-03-2024, 06:55 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-30-2024, 03:16 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Working...
        X