Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat Reporting output tracks - [FAILED]

    Hi everyone,

    I am having a problem with Tophat that has been posted here before here.

    Basically, Tophat works fine, but in the last step when it writes all of the output, the operation fails, and we instead get the message:

    [2012-10-16 07:44:11] Reporting output tracks
    Error running tophat-2.0.5/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir ./tophat_out/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p11 --inner-dist-mean 50 --inner-dist-std-dev 20 --no-closure-search --no-coverage-search --no-microexon-search --sam-header ./tophat_out/tmp/genome_ref_genome.bwt.samheader.sam --report-secondary-alignments --report-discordant-pair-alignments --report-mixed-alignments --samtools=/apps/group/bioinformatics/apps/samtools-0.1.18/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 ./tophat_out/tmp/genome_ref.fa ./tophat_out/junctions.bed ./tophat_out/insertions.bed ./tophat_out/deletions.bed ./tophat_out/fusions.out ./tophat_out/tmp/accepted_hits ./tophat_out/tmp/left_kept_reads.mapped.bam,./tophat_out/tmp/left_kept_reads.candidates ./tophat_out/tmp/left_kept_reads.bam ./tophat_out/tmp/right_kept_reads.mapped.bam,./tophat_out/tmp/right_kept_reads.candidates ./tophat_out/tmp/right_kept_reads.bam
    Loaded 246756 junctions

    I originally received this message when running Tophat 2.0.5. The previous post I referenced had many people with the exact same problem, and they reported that switching to Tophat 2.0.4 often solved the error for some reason.

    However, I've also tried using Tophat 2.0.4, with the exact same results.

    Does anyone have any insight as to why this is happening? I appreciate your advice

  • #2
    I just got the same thing, on Tophat 2.0.5:

    [2012-10-17 07:55:21] Reporting output tracks
    Error running /usr/local/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir analysis-
    multi/CaS1D/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-cove
    rage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 6 --read-realign-edit-dist 7
    --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations Reference/AAA/melper.gtf --gtf-juncs analysis-multi/CaS1D
    /tmp/melper.juncs --no-closure-search --no-microexon-search --rg-id HS3 --sam-header analysis-multi/CaS1D/tmp/melper_genome.bwt.samheader.sam --report-secondary-alignments --report-
    discordant-pair-alignments --report-mixed-alignments --samtools=/usr/local/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-
    open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 Reference/AAA/melper.fa analysis-multi/CaS1D/junctions.bed analysis-multi/CaS1D/insertions.bed ana
    lysis-multi/CaS1D/deletions.bed analysis-multi/CaS1D/fusions.out analysis-multi/CaS1D/tmp/accepted_hits analysis-multi/CaS1D/tmp/left_kept_reads.m2g.bam,analysis-multi/CaS1D/tmp/lef
    t_kept_reads.m2g_um.mapped.bam,analysis-multi/CaS1D/tmp/left_kept_reads.m2g_um.candidates analysis-multi/CaS1D/tmp/left_kept_reads.bam analysis-multi/CaS1D/tmp/right_kept_reads.m2g.
    bam,analysis-multi/CaS1D/tmp/right_kept_reads.m2g_um.mapped.bam,analysis-multi/CaS1D/tmp/right_kept_reads.m2g_um.candidates analysis-multi/CaS1D/tmp/right_kept_reads.bam

    One thing that seemed to be coming up with greater-than-chance frequency in the last round of issues was that the reads were paired end... Or the bugfix on 2.0.4 says "for large datasets". Mine is about 30M paired end reads, so I'm not sure if that hits their threshold or not...

    I'm going to try building from source, as one person suggested, and see if that fixes anything. Will report back if there's success.


    • #3
      It could be something due to paired end, but I doubt if the size of the data set is relevant. To test this, I only used the first 50,000 reads of a single sample, for forward and reverse since PE, and ran the same tophat commands.

      Tophat v2.0.4 and v2.0.5 both resulted in the same error message.

      As a side note, I am running tophat with the following parameters:
      --max-multihits 20 --report-secondary-alignments

      Perhaps everyone receiving this error is also passing these parameters??


      • #4
        So, I found out something interesting.

        No matter the version, if I remove the optional parameters specifying tophat to report all secondary alignments, the program no longer crashes.

        Therefore, it is likely these optional parameters causing the problem. Has anyone been able to successfully pass these parameters and complete tophat error free??


        • #5
          Originally posted by all_your_base View Post
          As a side note, I am running tophat with the following parameters:
          --max-multihits 20 --report-secondary-alignments

          Perhaps everyone receiving this error is also passing these parameters??

          Good catch. I'm running with (for 2.0.4):
          -p8 --no-novel-juncs --read-mismatches 6 --report-secondary-alignments

          That doesn't seem to be a problem on single-end libraries, so it's possible there's some interaction between reporting secondary alignments and the paired ends.


          • #6
            Not sure if this is helpful for debugging, but when I try to use the --resume feature in 2.0.5, it errors out:

            [2012-10-17 12:36:36] Resuming TopHat run in directory 'analysis-multi/CaS1D/' stage 'tophat_reports'
            [2012-10-17 12:36:37] Checking for Bowtie
            Bowtie version:
            [2012-10-17 12:36:37] Checking for Samtools
            Samtools version:
            [2012-10-17 12:36:37] Checking for reference FASTA file
            format: fastq
            quality scale: phred33 (default)
            [2012-10-17 12:36:40] Reading known junctions from GTF file
            [2012-10-17 12:36:42] Prepared reads:
            left reads: min. length=50, max. length=50, 36717882 kept reads (972 discarded)
            right reads: min. length=50, max. length=50, 36687792 kept reads (31062 discarded)
            [2012-10-17 12:36:42] Using pre-built transcriptome index..
            Traceback (most recent call last):
            File "/usr/local/bin/tophat", line 4035, in <module>
            File "/usr/local/bin/tophat", line 4002, in main
            File "/usr/local/bin/tophat", line 3406, in spliced_alignment
            map2gtf(params, sam_header_filename, ref_fasta, left_reads, right_reads)
            File "/usr/local/bin/tophat", line 3245, in map2gtf
            transcriptome_header_filename = get_index_sam_header(params, m2g_bwt_idx)
            File "/usr/local/bin/tophat", line 1391, in get_index_sam_header
            bowtie_sam_header_filename = tmp_dir + idx_prefix.split('/')[-1]
            AttributeError: 'NoneType' object has no attribute 'split'


            • #7

              I think you may be right about the PE reads + secondary alignment reporting... I just reran a series of tests using all defaults except the following parameter:


              Instead of the two parameters I used last time:

              --max-multihits 20 --report-secondary-alignments

              So, with no special parameters, tophat runs great with my PE data, but when I ask it to report the secondary alignments, the job fails. I wonder if anyone else has experienced this same problem.


              • #8
                I've gotten in touch with the Tophat maintainers, and sent them a minimal set of data that reproduces the error, so hopefully we'll get a bugfix soon. I'm also going to try running it on a Mac to see if it's a Linux specific error...


                • #9
                  Lo and behold, it does seem to crash on the Mac as well. Fortunately, thanks to a failure to clean out old versions, I discovered it does work on tophat 2.0.0. Time to start walking backwards through the versions until I find something not broken...


                  • #10
                    That was quick! 2.0.3 works as well... until I hear back about a bug fix, I'll use that.


                    • #11
                      Interesting, on my data set I tried 2.0.0, 2.0.4, and 2.0.5... all of which failed! (But missed the magical 2.0.3) I wonder why your 2.0.0 worked and mine did not...

                      Anyway, let's see what the devs say.


                      • #12
                        The devs got back to me with a link to the unofficial version of 2.0.6 that also seems to work. It's up at , although I'm sure the usual caveats apply about this being software in development, may not function as expected, may ransom your firstborn to Somali pirates, etc.


                        • #13
                          That's great, thanks for the link. I reran my analysis in a different manner to circumvent this tophat bug, but if I need run the same pipeline in the future I'll be sure to use 2.0.6. Hopefully some of the other people posting on the last forum regarding this problem will see your post too.


                          • #14
                            Hello, This is my first post.
                            i am using Tophat 2.0.6 with Bowtie 0.12 and i am trying to run this command on my clusters:
                            tophat -o /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2 -i 10 -I 11000 --min-coverage-intron 10 --max-coverage-intron 11000 --min-segment-intron 10 --max-segment-intron 11000 -p 22 -G /storage16/projects/asfaw_degu/02.genome/Vitis_vinifera.IGGP_12x.15.gtf -M /storage16/projects/asfaw_degu/02.genome/Vitis_vinifera /storage16/projects/asfaw_degu/01.fastq/00.Project_Aaron_Fait/Sample_CS_21/R1/CS_21_TAGCTT_L006_R1.fastq /storage16/projects/asfaw_degu/01.fastq/00.Project_Aaron_Fait/Sample_CS_21/R2/CS_21_TAGCTT_L006_R2.fastq
                            and i get an error at "Reporting output tracks" this is the log's tail:

                            2012-11-28 23:25:28] Reporting output tracks
                            Error running /storage16/app/bioinfo/tophat-2.0.6.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 10 --max-report-intron 11000 --min-isoform-fraction 0.15 --output-dir /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 10 --max-coverage-intron 11000 --min-segment-intron 10 --max-segment-intron 11000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 --bowtie1 -z gzip -p22 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations /storage16/projects/asfaw_degu/02.genome/Vitis_vinifera.IGGP_12x.15.gtf --gtf-juncs /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/Vitis_vinifera.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/Vitis_vinifera_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /storage16/projects/asfaw_degu/02.genome/Vitis_vinifera.fa /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/junctions.bed /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/insertions.bed /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/deletions.bed /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/fusions.out /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/accepted_hits /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/left_kept_reads.m2g.bam,/fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/left_kept_reads.m2g_um.mapped,/fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/left_kept_reads.m2g_um.candidates /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/left_kept_reads.bam /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/right_kept_reads.m2g.bam,/fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/right_kept_reads.m2g_um.mapped,/fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/right_kept_reads.m2g_um.candidates /fastspace/bioinfo_projects/asfaw_degu/th2_CS21_2/tmp/right_kept_reads.bam
                            open: Too many open files
                            i assume this is related to one of the flags as some said in the other posts above me, but my error is different than what they got.

                            any advice would be appreciated!


                            • #15
                              Googling the error at the very bottom ("open: Too many open files"), yours might be a different issue. Do you get the same error if you try to run it on a much smaller test set (say, 10,000 reads)? Is there anyone else on the same linux box doing file-intensive stuff?


                              Latest Articles


                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin

                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                06-06-2024, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin

                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM





                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:58 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:18 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:04 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 06-03-2024, 06:55 AM
                              0 responses
                              Last Post seqadmin  