Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • upadhyayanm
    Junior Member
    • Oct 2011
    • 3

    Tophat error -segment-based junction search failled with err=1

    Hi

    Lately, I have a problem in running tophat 1.3.1 with a 100bp paired-end Illumina HiSeq RNA reads. After cleaning (quality trim, duplicate removal, adapter removal) I did split the files (taking care not to split the last entry sequence and quality scores) and fed to tophat . Please note, here I have more of left-kept reads because I have an extra file with leftover unpaired reads. Also, I have noticed with previous successful runs, eventhough the fed fastq paired read files have the same number of sequences what we see (in the log) as left-reads and right reads are slightly different.

    Here is the log:

    [Thu Oct 27 18:33:40 2011] Beginning TopHat run (v1.3.1)
    -----------------------------------------------
    [Thu Oct 27 18:33:40 2011] Preparing output location ./tophat_out/
    [Thu Oct 27 18:33:40 2011] Checking for Bowtie index files
    [Thu Oct 27 18:33:40 2011] Checking for reference FASTA file
    [Thu Oct 27 18:33:40 2011] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Thu Oct 27 18:33:40 2011] Checking for Samtools
    Samtools Version: 0.1.12a
    [Thu Oct 27 18:33:40 2011] Generating SAM header for ../PG210SC5
    [Thu Oct 27 18:33:40 2011] Preparing reads
    format: fastq
    quality scale: phred33 (default)
    Left reads: min. length=50, count=134790672
    Right reads: min. length=50, count=118121205
    [Thu Oct 27 20:34:22 2011] Mapping left_kept_reads against PG210SC5 with Bowtie
    [Thu Oct 27 21:42:15 2011] Processing bowtie hits
    [Thu Oct 27 23:08:30 2011] Mapping left_kept_reads_seg1 against PG210SC5 with Bowtie (1/4)
    [Fri Oct 28 00:27:19 2011] Mapping left_kept_reads_seg2 against PG210SC5 with Bowtie (2/4)
    [Fri Oct 28 01:47:04 2011] Mapping left_kept_reads_seg3 against PG210SC5 with Bowtie (3/4)
    [Fri Oct 28 02:57:47 2011] Mapping left_kept_reads_seg4 against PG210SC5 with Bowtie (4/4)
    [Fri Oct 28 04:25:49 2011] Mapping right_kept_reads against PG210SC5 with Bowtie
    [Fri Oct 28 05:26:52 2011] Processing bowtie hits
    [Fri Oct 28 06:48:08 2011] Mapping right_kept_reads_seg1 against PG210SC5 with Bowtie (1/4)
    [Fri Oct 28 08:00:12 2011] Mapping right_kept_reads_seg2 against PG210SC5 with Bowtie (2/4)
    [Fri Oct 28 09:11:43 2011] Mapping right_kept_reads_seg3 against PG210SC5 with Bowtie (3/4)
    [Fri Oct 28 10:21:22 2011] Mapping right_kept_reads_seg4 against PG210SC5 with Bowtie (4/4)
    [Fri Oct 28 11:56:21 2011] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =1

    ____________________________________________________________________________________________

    In the segment_juncs.log the last entry reads:

    FZStream::rewind() popen(gzip -cd './tophat_out/tmp/left_kept_reads_seg1_missing.fq.z') failed


    I have previously used such mixture of paired and unpaired reads successfully (I think!) with another set of reads. However, they were smaller read sets. Even with the above when I use only one pair out of four split files it works fine.

    Appreciate if anyone can help me to resolve this problem.
  • canbruce
    Junior Member
    • Dec 2011
    • 1

    #2
    I have the same problem. Were you able to figure out the reason for this error?

    -canbruce

    Comment

    • upadhyayanm
      Junior Member
      • Oct 2011
      • 3

      #3
      Not yet. I suspect tophat is running out of memory. Although I am running it on a 48GB RAM Linux machine (Ubuntu OS) I think it is still not enough to handle such large inputs.

      Comment

      • townway
        Member
        • May 2009
        • 41

        #4
        I had the same problem today, hope someone can stand out and point the way to fix.

        My data directly output from illumina pipeline with two fastq files.
        Originally posted by upadhyayanm View Post
        Hi

        Lately, I have a problem in running tophat 1.3.1 with a 100bp paired-end Illumina HiSeq RNA reads. After cleaning (quality trim, duplicate removal, adapter removal) I did split the files (taking care not to split the last entry sequence and quality scores) and fed to tophat . Please note, here I have more of left-kept reads because I have an extra file with leftover unpaired reads. Also, I have noticed with previous successful runs, eventhough the fed fastq paired read files have the same number of sequences what we see (in the log) as left-reads and right reads are slightly different.

        Here is the log:

        [Thu Oct 27 18:33:40 2011] Beginning TopHat run (v1.3.1)
        -----------------------------------------------
        [Thu Oct 27 18:33:40 2011] Preparing output location ./tophat_out/
        [Thu Oct 27 18:33:40 2011] Checking for Bowtie index files
        [Thu Oct 27 18:33:40 2011] Checking for reference FASTA file
        [Thu Oct 27 18:33:40 2011] Checking for Bowtie
        Bowtie version: 0.12.7.0
        [Thu Oct 27 18:33:40 2011] Checking for Samtools
        Samtools Version: 0.1.12a
        [Thu Oct 27 18:33:40 2011] Generating SAM header for ../PG210SC5
        [Thu Oct 27 18:33:40 2011] Preparing reads
        format: fastq
        quality scale: phred33 (default)
        Left reads: min. length=50, count=134790672
        Right reads: min. length=50, count=118121205
        [Thu Oct 27 20:34:22 2011] Mapping left_kept_reads against PG210SC5 with Bowtie
        [Thu Oct 27 21:42:15 2011] Processing bowtie hits
        [Thu Oct 27 23:08:30 2011] Mapping left_kept_reads_seg1 against PG210SC5 with Bowtie (1/4)
        [Fri Oct 28 00:27:19 2011] Mapping left_kept_reads_seg2 against PG210SC5 with Bowtie (2/4)
        [Fri Oct 28 01:47:04 2011] Mapping left_kept_reads_seg3 against PG210SC5 with Bowtie (3/4)
        [Fri Oct 28 02:57:47 2011] Mapping left_kept_reads_seg4 against PG210SC5 with Bowtie (4/4)
        [Fri Oct 28 04:25:49 2011] Mapping right_kept_reads against PG210SC5 with Bowtie
        [Fri Oct 28 05:26:52 2011] Processing bowtie hits
        [Fri Oct 28 06:48:08 2011] Mapping right_kept_reads_seg1 against PG210SC5 with Bowtie (1/4)
        [Fri Oct 28 08:00:12 2011] Mapping right_kept_reads_seg2 against PG210SC5 with Bowtie (2/4)
        [Fri Oct 28 09:11:43 2011] Mapping right_kept_reads_seg3 against PG210SC5 with Bowtie (3/4)
        [Fri Oct 28 10:21:22 2011] Mapping right_kept_reads_seg4 against PG210SC5 with Bowtie (4/4)
        [Fri Oct 28 11:56:21 2011] Searching for junctions via segment mapping
        [FAILED]
        Error: segment-based junction search failed with err =1

        ____________________________________________________________________________________________

        In the segment_juncs.log the last entry reads:

        FZStream::rewind() popen(gzip -cd './tophat_out/tmp/left_kept_reads_seg1_missing.fq.z') failed


        I have previously used such mixture of paired and unpaired reads successfully (I think!) with another set of reads. However, they were smaller read sets. Even with the above when I use only one pair out of four split files it works fine.

        Appreciate if anyone can help me to resolve this problem.

        Comment

        • Xi Wang
          Senior Member
          • Oct 2009
          • 317

          #5
          Hi twonway, I am just wondering what is the amount of your data. How many reads you fed to Tophat?


          Originally posted by townway View Post
          I had the same problem today, hope someone can stand out and point the way to fix.

          My data directly output from illumina pipeline with two fastq files.
          Xi Wang

          Comment

          • townway
            Member
            • May 2009
            • 41

            #6
            My data is around 200M reads from Hiseq one lane and I used 16 G memory to run Tophat 1.3.3 with coverage microexon butterfly search option.
            Btw It worked well with old version of tophat

            Originally posted by Xi Wang View Post
            Hi twonway, I am just wondering what is the amount of your data. How many reads you fed to Tophat?

            Comment

            • biznatch
              Senior Member
              • Nov 2010
              • 124

              #7
              The butterfly search option uses a lot of memory. I'm pretty sure you'll need a lot more than 16GB memory to align 200M reads using that option. I have 16GB and ran out of memory trying to align ~30M 100bp PE reads with the butterfly option.

              Comment

              • townway
                Member
                • May 2009
                • 41

                #8
                yes that is true, without them, it works well now.

                Originally posted by biznatch View Post
                The butterfly search option uses a lot of memory. I'm pretty sure you'll need a lot more than 16GB memory to align 200M reads using that option. I have 16GB and ran out of memory trying to align ~30M 100bp PE reads with the butterfly option.

                Comment

                • Xi Wang
                  Senior Member
                  • Oct 2009
                  • 317

                  #9
                  Tophat has updated to version 1.4.0 (BETA). Has anyone already tried this new version? As a big change in this new version, I think the strategy that Tophat maps reads to the transcriptome given by users first would be much stabler.
                  Xi Wang

                  Comment

                  • kesner
                    Member
                    • Apr 2012
                    • 10

                    #10
                    ReadStream::getRead() called with out-of-order id#!

                    I get a similar error. However, I get a different indication (see title).
                    After looking at the code, I think the error has to do with threading on multiple cores and Read_ids. In the section of the code I looked at, read_ids are handled distinctly for threaded and non-threaded code (I think). Am running latest (2.0.3). Am trying again without threading.
                    barry

                    Comment

                    • Auction
                      Member
                      • Jul 2009
                      • 24

                      #11
                      Kesner, have you solved the problem by not using threading. I have the same problems for segment_juncs
                      Processed 4000000 root segment groupssi
                      Error: ReadStream::getRead() called with out-of-order id#!

                      I'm using Tophat 1.4.1 (I have the same error for 2.0.3, but it's from tophat_reports). And it should not be a memory problem because I have 96G RAM. Therefore maybe something related to threading.
                      Originally posted by kesner View Post
                      I get a similar error. However, I get a different indication (see title).
                      After looking at the code, I think the error has to do with threading on multiple cores and Read_ids. In the section of the code I looked at, read_ids are handled distinctly for threaded and non-threaded code (I think). Am running latest (2.0.3). Am trying again without threading.

                      Comment

                      • kesner
                        Member
                        • Apr 2012
                        • 10

                        #12
                        re: problem fixed?

                        I think I get passed the problem by using single treading. Since there are many process on the machine I am using, it is possible some other resource failure was to blame.

                        Now my problem is that it is taking forever for the run to complete. Alignments are finished but the code does about 1 chr a day to process junctions. On the other hand, I'm not sure throwing multiple cores at this step does anything. I know my reads are contaminated with a lot of background. I figure that this is why I am having problems with the whole process in general.
                        barry

                        Comment

                        • Auction
                          Member
                          • Jul 2009
                          • 24

                          #13
                          I agreed that there should be something wrong with the resource allocation. I re-run some samples (also multi-threading), sometimes it got the same error message, sometimes I can finish it successfully. There this problem is not repeatable, and maybe very related the computer situation at running time.



                          Originally posted by kesner View Post
                          I think I get passed the problem by using single treading. Since there are many process on the machine I am using, it is possible some other resource failure was to blame.

                          Now my problem is that it is taking forever for the run to complete. Alignments are finished but the code does about 1 chr a day to process junctions. On the other hand, I'm not sure throwing multiple cores at this step does anything. I know my reads are contaminated with a lot of background. I figure that this is why I am having problems with the whole process in general.

                          Comment

                          • kesner
                            Member
                            • Apr 2012
                            • 10

                            #14
                            Does latest tophat version solve problem?

                            I was wondering If you still see the problem with the latest code build of tophat2?
                            barry

                            Comment

                            • ians
                              Member
                              • Aug 2011
                              • 53

                              #15
                              I am getting the same error with tophat 2.0.0.

                              tophat.log:
                              Code:
                              ....
                              [2012-06-30 11:50:15] Mapping right_kept_reads.m2g_um_seg4 against mm9.fa with Bowtie2 (4/4)
                              /usr/local/bin/tophat-2.0.0/fix_map_ordering: /lib64/libz.so.1: no version information available (required by /usr/local/bin/tophat-2.0.0/fix_map_ordering)
                              [2012-07-01 00:20:11] Searching for junctions via segment mapping
                                      [FAILED]
                              Error: segment-based junction search failed with err =1
                              Error: ReadStream::getRead() called with out-of-order id#!
                              segment_juncs.log:
                              Code:
                              ...
                                      Loading chrUn_random...done
                                      Loading chrX_random...done
                                      Loading chrY_random...done
                                      Loading ...done
                              >> Performing segment-search:
                              Loading left segment hits...
                              Error: ReadStream::getRead() called with out-of-order id#!
                              Has anyone uncovered anything recently? At U Texas, they report that single threading allowed proper execution. Does anyone know how to be able to "continue" the tophat procedure and restart from the segment-based junction search?? I'm going to try hacking the python script, but i hope someone has done it before. I have a dozen or so samples that have aligned for about a week. Don't want to redo the alignments, especially with only one core (OUCH!!)

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...