Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat2 error

    I ran tophat2 with bowite1 as dealing with color space reads. The command line I used was

    Code:
    tophat --bowtie1 --keep-tmp -o T34_tophat2 -p 8 --color --quals --library-type=fr-secondstrand --transcriptome-index=transcriptome/hg19_Ensemble.GRCh37_65 /home/xwang/data/hg
    19/bowtie_index/hg19.color T34.csfasta T34.qual
    But tophat ended up with an error:

    Code:
    [2012-04-27 10:36:10] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2012-04-27 10:36:10] Checking for Bowtie
    		  Bowtie version:	 0.12.7.0
    [2012-04-27 10:36:11] Checking for Samtools
    		Samtools version:	 0.1.17.0
    [2012-04-27 10:36:11] Checking for Bowtie index files
    [2012-04-27 10:36:11] Checking for Bowtie index files
    [2012-04-27 10:36:11] Checking for reference FASTA file
    [2012-04-27 10:36:11] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
    	format:		 fasta
    [2012-04-27 10:38:10] Reading known junctions from GTF file
    [2012-04-27 10:38:48] Preparing reads
    	 left reads: min. length=50, count=64422218
    [2012-04-27 11:43:11] Using pre-built transcriptome index..
    [2012-04-27 11:43:49] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
    [2012-04-27 12:11:41] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
    [2012-04-27 12:14:57] Resuming TopHat pipeline with unmapped reads
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "T34_tophat2/tmp/left_kept_reads.m2g_um.fq".
    [2012-04-27 12:14:57] Reporting output tracks
    -----------------------------------------------
    [2012-04-27 13:08:39] Run complete: 02:32:28 elapsed
    From the time points recorded, "Resuming TopHat pipeline with unmapped reads" wasn't executed, and it seemed the reason was the file "left_kept_reads.m2g_um.fq" was not found. But in fact the file is there.

    Any hints? Thanks.
    Xi Wang

  • #2
    Same issue

    Hi,

    I have very similar issue. I have used samtools and checked that every bam file could be opened without error. Could you solve your problem?

    Thanks,
    Saad

    Comment


    • #3
      Originally posted by saad0105050 View Post
      Hi,

      I have very similar issue. I have used samtools and checked that every bam file could be opened without error. Could you solve your problem?

      Thanks,
      Saad
      Sorry to hear about that you have the same issue. I've reported this bug to the developers. Hope them can find a solution soon. Anyone get this solved please share with us.
      Xi Wang

      Comment


      • #4
        I have the same issue with tophat2, using bowtie2. Some reads have qualities, some just have "*" in the quality field. Here are 2 examples, 1st with no quality, 2nd with quality:

        Code:
        HWI-ST201:229:C07HGACXX:2:1306:5066:164732:1:N:0:ATCACG	321	1	10015	0	91M	X	155260312	0	ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:91	YT:Z:UU	NH:i:20	CC:Z:5	CP:i:10285	HI:i:18
        HWI-ST201:229:C07HGACXX:2:1203:20609:127413:1:N:0:ATCACG	83	1	10129	3	51M1I6M1I6M1I25M	=	10335	298	CCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCT	?ABA<3?<?<<3DCB@B<DDCAA9,?CBA=5DDBB>;EDEB;7HHHED=JIGFA<GIIIHF?IJIIGHFJIHCFCJIGFHFIHFFFDJHFD	AS:i:-24	XN:i:0	XM:i:0	XO:i:3	XG:i:3	NM:i:3	MD:Z:88	YT:Z:UU	NH:i:2	CC:Z:=	CP:i:10129	HI:i:0
        I checked chromosome 1 to quantify this issue, I get 2,224,578 reads with quality=* and 24,160,938 reads had regular Phred scores.

        I have also sent report to the tophat email - but wanted to share that you're not alone!

        Comment


        • #5
          I had another run without mapping to a transcriptome but to the reference genome directly. Tophat2 ended up with a similar error:

          Code:
          fail to read the header from "T34_tophat2_genome/tmp/left_kept_reads_unmapped.fq".
          By looking into the tmp files, I found this issue might be relevant to the read IDs. I paste a head of "T34_tophat2_genome/tmp/left_kept_reads_unmapped.fq" here:

          Code:
          @39387
          T13323133231032130303001010113000104313423130441340
          +3_19_590_F3
          AAA=A2.%(='81-5%&;%%51(.1)&',')-'5!3**!*,'+)!!)=!,
          @39398
          T31202110130210003323123122331321034123433032442343
          +3_19_1526_F3
          (A=/5/A@>(.B=9)&BA@/=B>)>3B?)'*@??!)B-!&/8:2!!A<!'
          @39402
          T30231202003222033021022010303030024203413010441343
          +3_20_156_F3
          @8(&2,9(-3731%:3*''783&6)8.1'-+)0-!408!(3%%+!!+(!3
          @39403
          T31130311333002111122221010023203034033432000441040
          +3_20_203_F3
          A7B>5A:?>BB;@4'3:A=+;6<3?51@>'<,A>!=53!,/.-/!!0=!2
          The lines beginning with "@" and "+" have different read IDs.
          Xi Wang

          Comment


          • #6
            Tophat tmp.samheader.sam is broken

            I checked all bam/sam files in the tmp directory with samtools. It turns out that the file tmp.samheader.sam (and other sam files) cannot be opened with samtools, and it gives those exact error messages ([bam_header_read]... bad EOF etc.) that we see on screen.

            I ran bowtie with the exact commands issued by Tophat (from the run.log file). Bowtie runs fine (with both sam and plain-text output), and the output is valid. But when this output is piped to fix_map_order (an internal utility of Tophat), Tophat tries to read this temp.samheader.sam file and breaks. Note: this file is created very early when you run Tophat.

            Getting frustrated, I am not using Tophat for now. I have created my own splice junction library (through RSEQtools library) and intend to use bowtie (or bfast or bwa) to align my reads with both the reference genome and this splice junction library.
            Last edited by saad0105050; 04-30-2012, 11:13 AM. Reason: Typo in the tool name `RSEQtools'

            Comment


            • #7
              Error:

              +1 to all of you:
              I run this command:
              Code:
              $TOPHAT -o $DEST -C -Q --bowtie1 -p 60 -r 200 --mate-std-dev 30 --report-secondary-alignments --report-discordant-pair-alignments --coverage-search --microexon-search --library-type fr-secondstrand --keep-tmp -z0 $BOWTiEIndex/human_g1k_v37_decoy "$SAMPLE"_F3.csfasta "$SAMPLE"_F5.csfasta "$SAMPLE"_F3_QV.qual "$SAMPLE"_F5_QV.qual
              and got these message:
              Code:
              [2012-04-30 11:48:52] Beginning TopHat run (v2.0.0)
              -----------------------------------------------
              [2012-04-30 11:48:52] Checking for Bowtie
                                Bowtie version:        0.12.7.0
              [2012-04-30 11:48:52] Checking for Samtools
                              Samtools version:        0.1.18.0
              [2012-04-30 11:48:52] Checking for Bowtie index files
              [2012-04-30 11:48:52] Checking for reference FASTA file
              [2012-04-30 11:48:52] Generating SAM header for /home/biouml/galaxy/galaxy-tools-data/genomes/Hsapiens/hg19/bowtie_color//human_g1k_v37_decoy
                      format:          fasta
              [2012-04-30 11:49:32] Preparing reads
                       left reads: min. length=50, count=28234582
                      right reads: min. length=35, count=28088955
              [2012-04-30 12:19:10] Mapping left_kept_reads against human_g1k_v37_decoy with Bowtie 
              [bam_header_read] EOF marker is absent. The input is probably truncated.
              [bam_header_read] invalid BAM binary header (this is not a BAM file).
              [main_samview] fail to read the header from "tophat_out2/tmp/left_kept_reads_unmapped.fq".
              [2012-04-30 12:32:57] Mapping right_kept_reads against human_g1k_v37_decoy with Bowtie 
              [bam_header_read] EOF marker is absent. The input is probably truncated.
              [bam_header_read] invalid BAM binary header (this is not a BAM file).
              [main_samview] fail to read the header from "tophat_out2/tmp/right_kept_reads_unmapped.fq".
              Warning: junction database is empty!
              [2012-04-30 12:45:26] Processing bowtie hits
              [2012-04-30 13:06:25] Processing bowtie hits
              [2012-04-30 13:23:50] Reporting output tracks
              -----------------------------------------------
              [2012-04-30 13:48:50] Run complete: 01:59:58 elapsed
              Does anyone know solution?
              P.S. I sent all logs to developers, hope they will answer.

              Comment


              • #8
                Originally posted by saad0105050 View Post
                I checked all bam/sam files in the tmp directory with samtools. It turns out that the file tmp.samheader.sam (and other sam files) cannot be opened with samtools, and it gives those exact error messages ([bam_header_read]... bad EOF etc.) that we see on screen.

                I ran bowtie with the exact commands issued by Tophat (from the run.log file). Bowtie runs fine (with both sam and plain-text output), and the output is valid. But when this output is piped to fix_map_order (an internal utility of Tophat), Tophat tries to read this temp.samheader.sam file and breaks. Note: this file is created very early when you run Tophat.

                Getting frustrated, I am not using Tophat for now. I have created my own splice junction library (through RSEQtools library) and intend to use bowtie (or bfast or bwa) to align my reads with both the reference genome and this splice junction library.
                Thanks for sharing. However, I found "tmp.samheader.sam" is a SAM file and can be opened with `samtools view -S`. In my runs, `fix_map_order` worked properly and thus "temp.samheader.sam" may not be the reason. Could you please show us your "run.log" or the error message.

                My runs ended up with "left_kept_reads.m2g_um.fq", which was a FASTQ file, and I cannot understand at all why samtools tried to open a FASTQ file! It's ridiculous!
                Xi Wang

                Comment


                • #9
                  Originally posted by mikhmv View Post
                  +1 to all of you:
                  I run this command:
                  Code:
                  $TOPHAT -o $DEST -C -Q --bowtie1 -p 60 -r 200 --mate-std-dev 30 --report-secondary-alignments --report-discordant-pair-alignments --coverage-search --microexon-search --library-type fr-secondstrand --keep-tmp -z0 $BOWTiEIndex/human_g1k_v37_decoy "$SAMPLE"_F3.csfasta "$SAMPLE"_F5.csfasta "$SAMPLE"_F3_QV.qual "$SAMPLE"_F5_QV.qual
                  and got these message:
                  Code:
                  [2012-04-30 11:48:52] Beginning TopHat run (v2.0.0)
                  -----------------------------------------------
                  [2012-04-30 11:48:52] Checking for Bowtie
                                    Bowtie version:        0.12.7.0
                  [2012-04-30 11:48:52] Checking for Samtools
                                  Samtools version:        0.1.18.0
                  [2012-04-30 11:48:52] Checking for Bowtie index files
                  [2012-04-30 11:48:52] Checking for reference FASTA file
                  [2012-04-30 11:48:52] Generating SAM header for /home/biouml/galaxy/galaxy-tools-data/genomes/Hsapiens/hg19/bowtie_color//human_g1k_v37_decoy
                          format:          fasta
                  [2012-04-30 11:49:32] Preparing reads
                           left reads: min. length=50, count=28234582
                          right reads: min. length=35, count=28088955
                  [2012-04-30 12:19:10] Mapping left_kept_reads against human_g1k_v37_decoy with Bowtie 
                  [bam_header_read] EOF marker is absent. The input is probably truncated.
                  [bam_header_read] invalid BAM binary header (this is not a BAM file).
                  [main_samview] fail to read the header from "tophat_out2/tmp/left_kept_reads_unmapped.fq".
                  [2012-04-30 12:32:57] Mapping right_kept_reads against human_g1k_v37_decoy with Bowtie 
                  [bam_header_read] EOF marker is absent. The input is probably truncated.
                  [bam_header_read] invalid BAM binary header (this is not a BAM file).
                  [main_samview] fail to read the header from "tophat_out2/tmp/right_kept_reads_unmapped.fq".
                  Warning: junction database is empty!
                  [2012-04-30 12:45:26] Processing bowtie hits
                  [2012-04-30 13:06:25] Processing bowtie hits
                  [2012-04-30 13:23:50] Reporting output tracks
                  -----------------------------------------------
                  [2012-04-30 13:48:50] Run complete: 01:59:58 elapsed
                  Does anyone know solution?
                  P.S. I sent all logs to developers, hope they will answer.
                  Thanks! That's exactly what I got if I didn't map reads to a virtual transcriptome first. Seems something wrong with BAM/SAM file checking. More precisely, checking with the wrong files.
                  Xi Wang

                  Comment


                  • #10
                    seems this issue solved by modifying Tophat python script

                    Originally posted by Xi Wang View Post
                    Thanks! That's exactly what I got if I didn't map reads to a virtual transcriptome first. Seems something wrong with BAM/SAM file checking. More precisely, checking with the wrong files.
                    Following this finding, I've edited the Tophat python script and made the BAM/SAM file checking disabled, and finally got Tophat working well. But it took quite a lot time to run one of the following step "segment_juncs" and it still keeps running. See the Tophat message below.

                    Code:
                    [2012-05-01 11:58:20] Beginning TopHat run (v2.0.0)
                    -----------------------------------------------
                    [2012-05-01 11:58:20] Checking for Bowtie
                    		  Bowtie version:	 0.12.7.0
                    [2012-05-01 11:58:20] Checking for Samtools
                    		Samtools version:	 0.1.17.0
                    [2012-05-01 11:58:20] Checking for Bowtie index files
                    [2012-05-01 11:58:20] Checking for Bowtie index files
                    [2012-05-01 11:58:20] Checking for reference FASTA file
                    [2012-05-01 11:58:20] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
                    	format:		 fasta
                    [2012-05-01 11:59:25] Reading known junctions from GTF file
                    [2012-05-01 12:00:03] Preparing reads
                    	 left reads: min. length=50, count=64422218
                    [2012-05-01 13:02:39] Using pre-built transcriptome index..
                    [2012-05-01 13:03:03] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
                    [2012-05-01 13:30:59] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
                    [2012-05-01 13:34:20] Resuming TopHat pipeline with unmapped reads
                    [2012-05-01 13:34:20] Mapping left_kept_reads.m2g_um against hg19.color with Bowtie 
                    [2012-05-01 16:52:01] Mapping left_kept_reads.m2g_um_seg1 against hg19.color with Bowtie (1/2)
                    [2012-05-01 20:09:31] Mapping left_kept_reads.m2g_um_seg2 against hg19.color with Bowtie (2/2)
                    [2012-05-01 23:25:42] Searching for junctions via segment mapping
                    I checked the CPU usage, and it seems that "segment_juncs" wasn't parallelised. If the developers can make this sub-routine parallelised, it will save lot of time.
                    Xi Wang

                    Comment


                    • #11
                      yes, I have the same problem, it has been running for two days (48 hours), and no file updates in the tmp folder for last 10 hours..it seems to be stopped...

                      did you fix this problem?

                      Thanks

                      Originally posted by Xi Wang View Post
                      Following this finding, I've edited the Tophat python script and made the BAM/SAM file checking disabled, and finally got Tophat working well. But it took quite a lot time to run one of the following step "segment_juncs" and it still keeps running. See the Tophat message below.

                      Code:
                      [2012-05-01 11:58:20] Beginning TopHat run (v2.0.0)
                      -----------------------------------------------
                      [2012-05-01 11:58:20] Checking for Bowtie
                      		  Bowtie version:	 0.12.7.0
                      [2012-05-01 11:58:20] Checking for Samtools
                      		Samtools version:	 0.1.17.0
                      [2012-05-01 11:58:20] Checking for Bowtie index files
                      [2012-05-01 11:58:20] Checking for Bowtie index files
                      [2012-05-01 11:58:20] Checking for reference FASTA file
                      [2012-05-01 11:58:20] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
                      	format:		 fasta
                      [2012-05-01 11:59:25] Reading known junctions from GTF file
                      [2012-05-01 12:00:03] Preparing reads
                      	 left reads: min. length=50, count=64422218
                      [2012-05-01 13:02:39] Using pre-built transcriptome index..
                      [2012-05-01 13:03:03] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
                      [2012-05-01 13:30:59] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
                      [2012-05-01 13:34:20] Resuming TopHat pipeline with unmapped reads
                      [2012-05-01 13:34:20] Mapping left_kept_reads.m2g_um against hg19.color with Bowtie 
                      [2012-05-01 16:52:01] Mapping left_kept_reads.m2g_um_seg1 against hg19.color with Bowtie (1/2)
                      [2012-05-01 20:09:31] Mapping left_kept_reads.m2g_um_seg2 against hg19.color with Bowtie (2/2)
                      [2012-05-01 23:25:42] Searching for junctions via segment mapping
                      I checked the CPU usage, and it seems that "segment_juncs" wasn't parallelised. If the developers can make this sub-routine parallelised, it will save lot of time.

                      Comment


                      • #12
                        Originally posted by townway View Post
                        yes, I have the same problem, it has been running for two days (48 hours), and no file updates in the tmp folder for last 10 hours..it seems to be stopped...

                        did you fix this problem?

                        Thanks
                        Yes, the running time for "segment_juncs" dealing with a large data set can be very slow. You may have a look at the logs folder, where up-to-date progress is recorded. I hadn't looked into this issue, but probably the developers should try to solve it out: fix the bug (if it is) or provide a new facility.
                        Xi Wang

                        Comment


                        • #13
                          Hey guys,

                          I'm having the same problem here. I think it has to do with the Colorspace formated reads, since I can run TopHat with normal Illumina fastq files without errors but not with these kind of colorspace reads. It seems for some reason bowtie1/TopHat are trying to read a fastq file as if it were a bam file, and everything fails down from there.

                          My temporary workaround will be to manually convert the colorspace reads to normal .fastq reads and map them with bowtie2 and against a normal index, since that should work.

                          Here is hoping the TopHat guys will fix this downstream at some point.

                          Comment


                          • #14
                            In case anyone is still struggling with this issue, I was able to get rid of this error by using a newer version of tophat (2.0.6). This is the call that I used (for single-end 50bp reads):

                            Code:
                            tophat --library-type fr-secondstrand --segment-length 25 --no-coverage-search --no-novel-juncs -G gencode.v14.annotation.gtf -o my_output_dir --color --bowtie1 --quals --transcriptome-index my_transcriptome_index hg19 "/unprotected/projects/lasvchal/moss/raw_data/my.csfasta" "/unprotected/projects/lasvchal/moss/raw_data/my_QV.qual"

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Understanding Genetic Influence on Infectious Disease
                              by seqadmin




                              During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                              Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                              09-09-2024, 10:59 AM
                            • seqadmin
                              Addressing Off-Target Effects in CRISPR Technologies
                              by seqadmin






                              The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                              08-27-2024, 04:44 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Today, 06:25 AM
                            0 responses
                            9 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 01:02 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 09-18-2024, 06:39 AM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 09-11-2024, 02:44 PM
                            0 responses
                            13 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X