Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nachocab
    replied
    In case anyone is still struggling with this issue, I was able to get rid of this error by using a newer version of tophat (2.0.6). This is the call that I used (for single-end 50bp reads):

    Code:
    tophat --library-type fr-secondstrand --segment-length 25 --no-coverage-search --no-novel-juncs -G gencode.v14.annotation.gtf -o my_output_dir --color --bowtie1 --quals --transcriptome-index my_transcriptome_index hg19 "/unprotected/projects/lasvchal/moss/raw_data/my.csfasta" "/unprotected/projects/lasvchal/moss/raw_data/my_QV.qual"

    Leave a comment:


  • Ender985
    replied
    Hey guys,

    I'm having the same problem here. I think it has to do with the Colorspace formated reads, since I can run TopHat with normal Illumina fastq files without errors but not with these kind of colorspace reads. It seems for some reason bowtie1/TopHat are trying to read a fastq file as if it were a bam file, and everything fails down from there.

    My temporary workaround will be to manually convert the colorspace reads to normal .fastq reads and map them with bowtie2 and against a normal index, since that should work.

    Here is hoping the TopHat guys will fix this downstream at some point.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by townway View Post
    yes, I have the same problem, it has been running for two days (48 hours), and no file updates in the tmp folder for last 10 hours..it seems to be stopped...

    did you fix this problem?

    Thanks
    Yes, the running time for "segment_juncs" dealing with a large data set can be very slow. You may have a look at the logs folder, where up-to-date progress is recorded. I hadn't looked into this issue, but probably the developers should try to solve it out: fix the bug (if it is) or provide a new facility.

    Leave a comment:


  • townway
    replied
    yes, I have the same problem, it has been running for two days (48 hours), and no file updates in the tmp folder for last 10 hours..it seems to be stopped...

    did you fix this problem?

    Thanks

    Originally posted by Xi Wang View Post
    Following this finding, I've edited the Tophat python script and made the BAM/SAM file checking disabled, and finally got Tophat working well. But it took quite a lot time to run one of the following step "segment_juncs" and it still keeps running. See the Tophat message below.

    Code:
    [2012-05-01 11:58:20] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2012-05-01 11:58:20] Checking for Bowtie
    		  Bowtie version:	 0.12.7.0
    [2012-05-01 11:58:20] Checking for Samtools
    		Samtools version:	 0.1.17.0
    [2012-05-01 11:58:20] Checking for Bowtie index files
    [2012-05-01 11:58:20] Checking for Bowtie index files
    [2012-05-01 11:58:20] Checking for reference FASTA file
    [2012-05-01 11:58:20] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
    	format:		 fasta
    [2012-05-01 11:59:25] Reading known junctions from GTF file
    [2012-05-01 12:00:03] Preparing reads
    	 left reads: min. length=50, count=64422218
    [2012-05-01 13:02:39] Using pre-built transcriptome index..
    [2012-05-01 13:03:03] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
    [2012-05-01 13:30:59] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
    [2012-05-01 13:34:20] Resuming TopHat pipeline with unmapped reads
    [2012-05-01 13:34:20] Mapping left_kept_reads.m2g_um against hg19.color with Bowtie 
    [2012-05-01 16:52:01] Mapping left_kept_reads.m2g_um_seg1 against hg19.color with Bowtie (1/2)
    [2012-05-01 20:09:31] Mapping left_kept_reads.m2g_um_seg2 against hg19.color with Bowtie (2/2)
    [2012-05-01 23:25:42] Searching for junctions via segment mapping
    I checked the CPU usage, and it seems that "segment_juncs" wasn't parallelised. If the developers can make this sub-routine parallelised, it will save lot of time.

    Leave a comment:


  • Xi Wang
    replied
    seems this issue solved by modifying Tophat python script

    Originally posted by Xi Wang View Post
    Thanks! That's exactly what I got if I didn't map reads to a virtual transcriptome first. Seems something wrong with BAM/SAM file checking. More precisely, checking with the wrong files.
    Following this finding, I've edited the Tophat python script and made the BAM/SAM file checking disabled, and finally got Tophat working well. But it took quite a lot time to run one of the following step "segment_juncs" and it still keeps running. See the Tophat message below.

    Code:
    [2012-05-01 11:58:20] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2012-05-01 11:58:20] Checking for Bowtie
    		  Bowtie version:	 0.12.7.0
    [2012-05-01 11:58:20] Checking for Samtools
    		Samtools version:	 0.1.17.0
    [2012-05-01 11:58:20] Checking for Bowtie index files
    [2012-05-01 11:58:20] Checking for Bowtie index files
    [2012-05-01 11:58:20] Checking for reference FASTA file
    [2012-05-01 11:58:20] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
    	format:		 fasta
    [2012-05-01 11:59:25] Reading known junctions from GTF file
    [2012-05-01 12:00:03] Preparing reads
    	 left reads: min. length=50, count=64422218
    [2012-05-01 13:02:39] Using pre-built transcriptome index..
    [2012-05-01 13:03:03] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
    [2012-05-01 13:30:59] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
    [2012-05-01 13:34:20] Resuming TopHat pipeline with unmapped reads
    [2012-05-01 13:34:20] Mapping left_kept_reads.m2g_um against hg19.color with Bowtie 
    [2012-05-01 16:52:01] Mapping left_kept_reads.m2g_um_seg1 against hg19.color with Bowtie (1/2)
    [2012-05-01 20:09:31] Mapping left_kept_reads.m2g_um_seg2 against hg19.color with Bowtie (2/2)
    [2012-05-01 23:25:42] Searching for junctions via segment mapping
    I checked the CPU usage, and it seems that "segment_juncs" wasn't parallelised. If the developers can make this sub-routine parallelised, it will save lot of time.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by mikhmv View Post
    +1 to all of you:
    I run this command:
    Code:
    $TOPHAT -o $DEST -C -Q --bowtie1 -p 60 -r 200 --mate-std-dev 30 --report-secondary-alignments --report-discordant-pair-alignments --coverage-search --microexon-search --library-type fr-secondstrand --keep-tmp -z0 $BOWTiEIndex/human_g1k_v37_decoy "$SAMPLE"_F3.csfasta "$SAMPLE"_F5.csfasta "$SAMPLE"_F3_QV.qual "$SAMPLE"_F5_QV.qual
    and got these message:
    Code:
    [2012-04-30 11:48:52] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2012-04-30 11:48:52] Checking for Bowtie
                      Bowtie version:        0.12.7.0
    [2012-04-30 11:48:52] Checking for Samtools
                    Samtools version:        0.1.18.0
    [2012-04-30 11:48:52] Checking for Bowtie index files
    [2012-04-30 11:48:52] Checking for reference FASTA file
    [2012-04-30 11:48:52] Generating SAM header for /home/biouml/galaxy/galaxy-tools-data/genomes/Hsapiens/hg19/bowtie_color//human_g1k_v37_decoy
            format:          fasta
    [2012-04-30 11:49:32] Preparing reads
             left reads: min. length=50, count=28234582
            right reads: min. length=35, count=28088955
    [2012-04-30 12:19:10] Mapping left_kept_reads against human_g1k_v37_decoy with Bowtie 
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "tophat_out2/tmp/left_kept_reads_unmapped.fq".
    [2012-04-30 12:32:57] Mapping right_kept_reads against human_g1k_v37_decoy with Bowtie 
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "tophat_out2/tmp/right_kept_reads_unmapped.fq".
    Warning: junction database is empty!
    [2012-04-30 12:45:26] Processing bowtie hits
    [2012-04-30 13:06:25] Processing bowtie hits
    [2012-04-30 13:23:50] Reporting output tracks
    -----------------------------------------------
    [2012-04-30 13:48:50] Run complete: 01:59:58 elapsed
    Does anyone know solution?
    P.S. I sent all logs to developers, hope they will answer.
    Thanks! That's exactly what I got if I didn't map reads to a virtual transcriptome first. Seems something wrong with BAM/SAM file checking. More precisely, checking with the wrong files.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by saad0105050 View Post
    I checked all bam/sam files in the tmp directory with samtools. It turns out that the file tmp.samheader.sam (and other sam files) cannot be opened with samtools, and it gives those exact error messages ([bam_header_read]... bad EOF etc.) that we see on screen.

    I ran bowtie with the exact commands issued by Tophat (from the run.log file). Bowtie runs fine (with both sam and plain-text output), and the output is valid. But when this output is piped to fix_map_order (an internal utility of Tophat), Tophat tries to read this temp.samheader.sam file and breaks. Note: this file is created very early when you run Tophat.

    Getting frustrated, I am not using Tophat for now. I have created my own splice junction library (through RSEQtools library) and intend to use bowtie (or bfast or bwa) to align my reads with both the reference genome and this splice junction library.
    Thanks for sharing. However, I found "tmp.samheader.sam" is a SAM file and can be opened with `samtools view -S`. In my runs, `fix_map_order` worked properly and thus "temp.samheader.sam" may not be the reason. Could you please show us your "run.log" or the error message.

    My runs ended up with "left_kept_reads.m2g_um.fq", which was a FASTQ file, and I cannot understand at all why samtools tried to open a FASTQ file! It's ridiculous!

    Leave a comment:


  • mikhmv
    replied
    Error:

    +1 to all of you:
    I run this command:
    Code:
    $TOPHAT -o $DEST -C -Q --bowtie1 -p 60 -r 200 --mate-std-dev 30 --report-secondary-alignments --report-discordant-pair-alignments --coverage-search --microexon-search --library-type fr-secondstrand --keep-tmp -z0 $BOWTiEIndex/human_g1k_v37_decoy "$SAMPLE"_F3.csfasta "$SAMPLE"_F5.csfasta "$SAMPLE"_F3_QV.qual "$SAMPLE"_F5_QV.qual
    and got these message:
    Code:
    [2012-04-30 11:48:52] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2012-04-30 11:48:52] Checking for Bowtie
                      Bowtie version:        0.12.7.0
    [2012-04-30 11:48:52] Checking for Samtools
                    Samtools version:        0.1.18.0
    [2012-04-30 11:48:52] Checking for Bowtie index files
    [2012-04-30 11:48:52] Checking for reference FASTA file
    [2012-04-30 11:48:52] Generating SAM header for /home/biouml/galaxy/galaxy-tools-data/genomes/Hsapiens/hg19/bowtie_color//human_g1k_v37_decoy
            format:          fasta
    [2012-04-30 11:49:32] Preparing reads
             left reads: min. length=50, count=28234582
            right reads: min. length=35, count=28088955
    [2012-04-30 12:19:10] Mapping left_kept_reads against human_g1k_v37_decoy with Bowtie 
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "tophat_out2/tmp/left_kept_reads_unmapped.fq".
    [2012-04-30 12:32:57] Mapping right_kept_reads against human_g1k_v37_decoy with Bowtie 
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "tophat_out2/tmp/right_kept_reads_unmapped.fq".
    Warning: junction database is empty!
    [2012-04-30 12:45:26] Processing bowtie hits
    [2012-04-30 13:06:25] Processing bowtie hits
    [2012-04-30 13:23:50] Reporting output tracks
    -----------------------------------------------
    [2012-04-30 13:48:50] Run complete: 01:59:58 elapsed
    Does anyone know solution?
    P.S. I sent all logs to developers, hope they will answer.

    Leave a comment:


  • saad0105050
    replied
    Tophat tmp.samheader.sam is broken

    I checked all bam/sam files in the tmp directory with samtools. It turns out that the file tmp.samheader.sam (and other sam files) cannot be opened with samtools, and it gives those exact error messages ([bam_header_read]... bad EOF etc.) that we see on screen.

    I ran bowtie with the exact commands issued by Tophat (from the run.log file). Bowtie runs fine (with both sam and plain-text output), and the output is valid. But when this output is piped to fix_map_order (an internal utility of Tophat), Tophat tries to read this temp.samheader.sam file and breaks. Note: this file is created very early when you run Tophat.

    Getting frustrated, I am not using Tophat for now. I have created my own splice junction library (through RSEQtools library) and intend to use bowtie (or bfast or bwa) to align my reads with both the reference genome and this splice junction library.
    Last edited by saad0105050; 04-30-2012, 11:13 AM. Reason: Typo in the tool name `RSEQtools'

    Leave a comment:


  • Xi Wang
    replied
    I had another run without mapping to a transcriptome but to the reference genome directly. Tophat2 ended up with a similar error:

    Code:
    fail to read the header from "T34_tophat2_genome/tmp/left_kept_reads_unmapped.fq".
    By looking into the tmp files, I found this issue might be relevant to the read IDs. I paste a head of "T34_tophat2_genome/tmp/left_kept_reads_unmapped.fq" here:

    Code:
    @39387
    T13323133231032130303001010113000104313423130441340
    +3_19_590_F3
    AAA=A2.%(='81-5%&;%%51(.1)&',')-'5!3**!*,'+)!!)=!,
    @39398
    T31202110130210003323123122331321034123433032442343
    +3_19_1526_F3
    (A=/5/A@>(.B=9)&BA@/=B>)>3B?)'*@??!)B-!&/8:2!!A<!'
    @39402
    T30231202003222033021022010303030024203413010441343
    +3_20_156_F3
    @8(&2,9(-3731%:3*''783&6)8.1'-+)0-!408!(3%%+!!+(!3
    @39403
    T31130311333002111122221010023203034033432000441040
    +3_20_203_F3
    A7B>5A:?>BB;@4'3:A=+;6<3?51@>'<,A>!=53!,/.-/!!0=!2
    The lines beginning with "@" and "+" have different read IDs.

    Leave a comment:


  • caddymob
    replied
    I have the same issue with tophat2, using bowtie2. Some reads have qualities, some just have "*" in the quality field. Here are 2 examples, 1st with no quality, 2nd with quality:

    Code:
    HWI-ST201:229:C07HGACXX:2:1306:5066:164732:1:N:0:ATCACG	321	1	10015	0	91M	X	155260312	0	ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:91	YT:Z:UU	NH:i:20	CC:Z:5	CP:i:10285	HI:i:18
    HWI-ST201:229:C07HGACXX:2:1203:20609:127413:1:N:0:ATCACG	83	1	10129	3	51M1I6M1I6M1I25M	=	10335	298	CCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCT	?ABA<3?<?<<3DCB@B<DDCAA9,?CBA=5DDBB>;EDEB;7HHHED=JIGFA<GIIIHF?IJIIGHFJIHCFCJIGFHFIHFFFDJHFD	AS:i:-24	XN:i:0	XM:i:0	XO:i:3	XG:i:3	NM:i:3	MD:Z:88	YT:Z:UU	NH:i:2	CC:Z:=	CP:i:10129	HI:i:0
    I checked chromosome 1 to quantify this issue, I get 2,224,578 reads with quality=* and 24,160,938 reads had regular Phred scores.

    I have also sent report to the tophat email - but wanted to share that you're not alone!

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by saad0105050 View Post
    Hi,

    I have very similar issue. I have used samtools and checked that every bam file could be opened without error. Could you solve your problem?

    Thanks,
    Saad
    Sorry to hear about that you have the same issue. I've reported this bug to the developers. Hope them can find a solution soon. Anyone get this solved please share with us.

    Leave a comment:


  • saad0105050
    replied
    Same issue

    Hi,

    I have very similar issue. I have used samtools and checked that every bam file could be opened without error. Could you solve your problem?

    Thanks,
    Saad

    Leave a comment:


  • Xi Wang
    started a topic tophat2 error

    tophat2 error

    I ran tophat2 with bowite1 as dealing with color space reads. The command line I used was

    Code:
    tophat --bowtie1 --keep-tmp -o T34_tophat2 -p 8 --color --quals --library-type=fr-secondstrand --transcriptome-index=transcriptome/hg19_Ensemble.GRCh37_65 /home/xwang/data/hg
    19/bowtie_index/hg19.color T34.csfasta T34.qual
    But tophat ended up with an error:

    Code:
    [2012-04-27 10:36:10] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2012-04-27 10:36:10] Checking for Bowtie
    		  Bowtie version:	 0.12.7.0
    [2012-04-27 10:36:11] Checking for Samtools
    		Samtools version:	 0.1.17.0
    [2012-04-27 10:36:11] Checking for Bowtie index files
    [2012-04-27 10:36:11] Checking for Bowtie index files
    [2012-04-27 10:36:11] Checking for reference FASTA file
    [2012-04-27 10:36:11] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
    	format:		 fasta
    [2012-04-27 10:38:10] Reading known junctions from GTF file
    [2012-04-27 10:38:48] Preparing reads
    	 left reads: min. length=50, count=64422218
    [2012-04-27 11:43:11] Using pre-built transcriptome index..
    [2012-04-27 11:43:49] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
    [2012-04-27 12:11:41] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
    [2012-04-27 12:14:57] Resuming TopHat pipeline with unmapped reads
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "T34_tophat2/tmp/left_kept_reads.m2g_um.fq".
    [2012-04-27 12:14:57] Reporting output tracks
    -----------------------------------------------
    [2012-04-27 13:08:39] Run complete: 02:32:28 elapsed
    From the time points recorded, "Resuming TopHat pipeline with unmapped reads" wasn't executed, and it seemed the reason was the file "left_kept_reads.m2g_um.fq" was not found. But in fact the file is there.

    Any hints? Thanks.

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
104 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
112 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
1 response
116 views
0 likes
Last Post EmiTom
by EmiTom
 
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
22 views
0 likes
Last Post seqadmin  
Working...
X