Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat error: could not get read# 3868130 from stream!

    Hi all,

    I am using Tophat to align colorspace RNA-seq to genome.fa.

    tophat -p 8 --color -o tophat_G1 --quals Pdom-preliminary-genome.index filt
    ered_G1_U_F3.csfasta filtered_G1_U_F3_QV.qual

    It ran into an error and prompted the following information.

    Thanks in advance.

    Ruolin


    [2012-06-29 10:26:22] Beginning TopHat run (v2.0.1)
    -----------------------------------------------
    [2012-06-29 10:26:22] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [2012-06-29 10:26:22] Checking for Samtools
    Samtools version: 0.1.18.0
    [2012-06-29 10:26:22] Checking for Bowtie index files
    [2012-06-29 10:26:22] Checking for reference FASTA file
    [2012-06-29 10:26:22] Generating SAM header for Pdom-preliminary-genome.index
    format: fasta
    [2012-06-29 10:26:25] Preparing reads
    left reads: min. length=75, max. length=75, 7135633 kept reads (423313
    discarded)
    [2012-06-29 10:32:45] Mapping left_kept_reads to genome Pdom-preliminary-genome.
    index with Bowtie
    [2012-06-29 10:47:31] Mapping left_kept_reads_seg1 to genome Pdom-preliminary-ge
    nome.index with Bowtie (1/3)
    [2012-06-29 10:54:16] Mapping left_kept_reads_seg2 to genome Pdom-preliminary-ge
    nome.index with Bowtie (2/3)
    [2012-06-29 11:00:45] Mapping left_kept_reads_seg3 to genome Pdom-preliminary-ge
    nome.index with Bowtie (3/3)
    [2012-06-29 11:07:19] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =1
    Error: could not get read# 3868130 from stream!

  • #2
    same problem

    Hi,

    I know this isnt very helpful but I am having the same problem. I am using color space reads and v2.0.4.

    [2012-07-13 19:31:25] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =1
    Error: could not get read# 57335084 from stream!

    Can any one help?

    Comment


    • #3
      Threading seems to be the problem

      Hi,
      I had/have the exact same problem for quite some time with tophat2. After an intensive search for the problem I have come to the conclusion that it is probably a threading problem. You can 'solve' this by setting the number of threads to 1 (option -p). Removing the -p option completely should also work since 1 is the default. Logically this dramatically increases the run time. So hopefully there will be soon a new version (current version 2.0.4) where this threading problem is fixed...

      (Btw I did not encounter this threading problem with tophat1.4)

      Comment


      • #4
        I am having the same problem with color space reads and TopHat v 2.0.4. I have tried removing the -p option and letting the job run on one processor - but still getting the same error. Any other ideas?

        Comment


        • #5
          Other possible causes

          Hi scor,
          There are other threads about this issue that point towards memory and permission issues. So it could be usefull to check you're computer/cluster for memory usage, and available harddrive space during the tophat run.

          Reg.

          Comment


          • #6
            Hi all,

            This problem with colorspace exists also in tophat 2.0.6 and has taken really a lot of my time... Unfortunately, switching to single core processing is not a solution for large datasets... For my case, and after playing a lot to see which parameters are causing the problem, the only way to maintain multi-threading is to switch off the --no-coverage-search option which also takes ages to complete with large datasets...

            Panos

            Comment


            • #7
              And just verified it in tophat 2.0.7... Does anyone have a possible solution?

              Comment


              • #8
                Hello,
                Got the same issue (with paired-end F5 35bp, F3 75bp). I realized that the issue was related to F3 (the long fragment) as tophat was able to map F5 reads. I finally managed to make tophat work by trimming F3 reads down to 69. Perhaps, an internal magic number...

                Hope it will help....

                Comment


                • #9
                  A 'maybe' Solution

                  I've solved this problem for now.

                  Well, I have the same problem. I am dealing with ~170M colorspace reads.

                  My Tophat Version:
                  TopHat v2.0.8b

                  My ERROR:
                  [2013-05-21 10:55:08] Searching for junctions via segment mapping
                  [FAILED]
                  Error: segment-based junction search failed with err =1
                  Error: could not get read# 123430531 from stream!

                  My Solution:
                  NOTE: This solution works without any error only if you have a pre-built transcriptomic index. See this link on how to build your transcriptomic index <link>

                  The apparent reason that I've read online for this problem seems to be with the the number of threads being greater than 1 [example: -p 20 in the tophat execution options]. Starting tophat again with -p 1 will be a tidious and a time consuming process. So, the ideal solution would be to 'Resume' the process from the last successful checkpoint. Fortunately Tophat provides an option for this <resume description>.


                  I've used this description to tweek the run.log file by replacing the -p #number to -p 1 i.e, changing your initial number of threads to 1. This tweek resumes from the last successful checkpoint with 1 thread.
                  Note 1: Upon the first resuming, you will have the file run.resume0.log. If incase your resuming ends up with no success, edit both the run.log and run.resume0.log file with -p 1 (editing just the first command is sufficient, the subsequent commands are built based on that initial command).

                  Note 2: It is always good to back up the run.log and run.resume0.log files before you do such tweekings; else you might end up screwing up the whole thing.
                  But, this tweek is working fairly well enough for me with no problems.

                  Although this tweek resumes from the last successful checkpoint with 1 thread, the subsequent processes down the pipeline will also run with 1 thread. In order to solve for this, follow these steps:
                  This is the error-prone step:
                  [2013-05-21 15:35:22] Searching for junctions via segment mapping

                  So, use the above tweek and this process will be completed and you will enter this step:
                  [2013-05-21 11:42:50] Retrieving sequences for splices

                  Now break the process with Ctrl + c
                  Re-edit the run.log file with -p 20 (I chose 20 because I have 24 cores in my CPU)
                  And resume the process again with tophat -R output_Dir


                  You can give this tweek a shot to see if its working for you.

                  P.S: This is a silly point, such things should be fixed in the tophat code itself.
                  Last edited by mallela; 05-22-2013, 07:52 AM. Reason: updating the information

                  Comment


                  • #10
                    As you can see in my previous post, I solve this problem just by "unsetting" -p parameter.

                    Comment


                    • #11
                      Very nice Mattia My post above is also about "unsetting" -p parameter.

                      Your solution unsets the -p parameter for all the steps of TopHat right from the beggining, thereby proving no significance for the creation of multi-threading in TopHat (default -p is 1).

                      On contrast to yours, my tweek helps to selectively un-set the -p parameter at the error location (i.e at the [2013-05-21 15:35:22] Searching for junctions via segment mapping ) & also preserving the functionality of multi-threading, thereby aiding for faster runs.

                      After the above step is successfully done, i.e, at
                      [2013-05-21 11:42:50] Retrieving sequences for splices
                      I will break the process with ctrl+c ; and re-edit the run.log file with -p 20 and resume the process again. (In short, injecting back the multi-threading option that is removed for the previous error prone step.)

                      Comment


                      • #12
                        its a threading problem remove -p option

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X