Hi all,
To prepare for analyzing some RNAseq data we are expecting to get soon, I'm trying to use TopHat to analyze a similar published dataset from the SRA (http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE21323). We are also using SOLiD and are expecting 36-bp reads, though our data will be paired-end.
After downloading the .sra file and converting it (with fastq-dump and a perl script from another thread on the forum), I have a prmt5.csfasta file that looks like:
and a prmt5.qual file that looks like:
My tophat command is:
TopHat seems to run fine:
but junctions.bed, deletions.bed and insertions.bed are empty (aside from the header). accepted_hits.bam is 539MB, so I guess it's OK. I had a look through the logs and found the following in long_spanning_reads.log:
I don't know what "malformed closure" means and didn't find anything when I searched the forum or google. There are no other obvious errors in the log files, though this is all new to me, so I might be missing something. segment_juncs.log says "Found 19801 potential split-segment junctions", so I'm confused that junctions.bed is empty...or am I misunderstanding something?
Can anyone provide some pointers about what's going wrong? Am I making a simple, silly mistake or is there something fundamental I'm misunderstanding?
I thought it would be fine to do these test runs on my desktop (Core2 Duo 3GHz with 4GB RAM); could that be causing trouble (eg, not enough memory)?
Thanks for any help, guidance or suggestions!
To prepare for analyzing some RNAseq data we are expecting to get soon, I'm trying to use TopHat to analyze a similar published dataset from the SRA (http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE21323). We are also using SOLiD and are expecting 36-bp reads, though our data will be paired-end.
After downloading the .sra file and converting it (with fastq-dump and a perl script from another thread on the forum), I have a prmt5.csfasta file that looks like:
>prmt5.1
T12333110000213211212111200120201030
>prmt5.2
T22222230320112002111202002111030220
>prmt5.3
T00112120103200010122310030111101112
>prmt5.4
T33113031103310232122213233232311110
>prmt5.5
T32211100022011031031033221131332201
[...]
T12333110000213211212111200120201030
>prmt5.2
T22222230320112002111202002111030220
>prmt5.3
T00112120103200010122310030111101112
>prmt5.4
T33113031103310232122213233232311110
>prmt5.5
T32211100022011031031033221131332201
[...]
>prmt5.1
4 5 5 2 2 4 4 3 7 5 3 5 4 2 4 3 5 5 7 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 2
>prmt5.2
5 5 7 9 5 5 4 6 3 5 4 5 5 5 5 5 5 12 10 5 5 8 5 6 3 5 11 5 9 5 7 8 5 5 4
>prmt5.3
3 3 2 4 4 3 3 4 5 5 2 5 4 5 4 5 5 4 3 3 5 5 5 5 5 5 5 4 5 5 3 4 4 4 5
>prmt5.4
2 10 5 4 5 2 4 2 5 5 2 2 2 2 3 2 5 2 2 4 2 4 4 5 5 5 4 4 4 5 3 4 2 4 4
>prmt5.5
3 5 8 14 4 2 5 7 5 5 2 4 10 11 5 2 7 11 11 10 2 2 3 10 4 5 5 13 14 3 2 5 5 10 2
[...]
4 5 5 2 2 4 4 3 7 5 3 5 4 2 4 3 5 5 7 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 2
>prmt5.2
5 5 7 9 5 5 4 6 3 5 4 5 5 5 5 5 5 12 10 5 5 8 5 6 3 5 11 5 9 5 7 8 5 5 4
>prmt5.3
3 3 2 4 4 3 3 4 5 5 2 5 4 5 4 5 5 4 3 3 5 5 5 5 5 5 5 4 5 5 3 4 4 4 5
>prmt5.4
2 10 5 4 5 2 4 2 5 5 2 2 2 2 3 2 5 2 2 4 2 4 4 5 5 5 4 4 4 5 3 4 2 4 4
>prmt5.5
3 5 8 14 4 2 5 7 5 5 2 4 10 11 5 2 7 11 11 10 2 2 3 10 4 5 5 13 14 3 2 5 5 10 2
[...]
tophat -o prmt5_tophat-out --segment-length 17 --segment-mismatches 1 -CQ -a 4 a_thaliana-color prmt5.csfasta prmt5.qual 1> tophat.prmt5.out 2> tophat.prmt5.err
[Fri Jul 1 10:58:01 2011] Beginning TopHat run (v1.3.0)
-----------------------------------------------
[Fri Jul 1 10:58:01 2011] Preparing output location Col_tophat-out/
[Fri Jul 1 10:58:01 2011] Checking for Bowtie index files
[Fri Jul 1 10:58:01 2011] Checking for reference FASTA file
[Fri Jul 1 10:58:01 2011] Checking for Bowtie
Bowtie version: 0.12.7.0
[Fri Jul 1 10:58:01 2011] Checking for Samtools
Samtools Version: 0.1.16
[Fri Jul 1 10:58:01 2011] Generating SAM header for a_thaliana-color
[Fri Jul 1 10:58:05 2011] Preparing reads
format: fasta
Left reads: min. length=36, count=41286371
[Fri Jul 1 11:15:33 2011] Mapping left_kept_reads against a_thaliana-color with Bowtie
[Fri Jul 1 11:39:45 2011] Processing bowtie hits
[Fri Jul 1 12:36:35 2011] Mapping left_kept_reads_seg1 against a_thaliana-color with Bowtie (1/2)
[Fri Jul 1 13:50:13 2011] Mapping left_kept_reads_seg2 against a_thaliana-color with Bowtie (2/2)
[Fri Jul 1 14:35:15 2011] Searching for junctions via segment mapping
[Fri Jul 1 16:12:24 2011] Retrieving sequences for splices
[Fri Jul 1 16:12:37 2011] Indexing splices
[Fri Jul 1 16:12:38 2011] Mapping left_kept_reads_seg1 against segment_juncs with Bowtie (1/2)
[Fri Jul 1 16:19:32 2011] Mapping left_kept_reads_seg2 against segment_juncs with Bowtie (2/2)
[Fri Jul 1 16:25:30 2011] Joining segment hits
[Fri Jul 1 16:37:26 2011] Reporting output tracks
-----------------------------------------------
Run complete [05:49:39 elapsed]
-----------------------------------------------
[Fri Jul 1 10:58:01 2011] Preparing output location Col_tophat-out/
[Fri Jul 1 10:58:01 2011] Checking for Bowtie index files
[Fri Jul 1 10:58:01 2011] Checking for reference FASTA file
[Fri Jul 1 10:58:01 2011] Checking for Bowtie
Bowtie version: 0.12.7.0
[Fri Jul 1 10:58:01 2011] Checking for Samtools
Samtools Version: 0.1.16
[Fri Jul 1 10:58:01 2011] Generating SAM header for a_thaliana-color
[Fri Jul 1 10:58:05 2011] Preparing reads
format: fasta
Left reads: min. length=36, count=41286371
[Fri Jul 1 11:15:33 2011] Mapping left_kept_reads against a_thaliana-color with Bowtie
[Fri Jul 1 11:39:45 2011] Processing bowtie hits
[Fri Jul 1 12:36:35 2011] Mapping left_kept_reads_seg1 against a_thaliana-color with Bowtie (1/2)
[Fri Jul 1 13:50:13 2011] Mapping left_kept_reads_seg2 against a_thaliana-color with Bowtie (2/2)
[Fri Jul 1 14:35:15 2011] Searching for junctions via segment mapping
[Fri Jul 1 16:12:24 2011] Retrieving sequences for splices
[Fri Jul 1 16:12:37 2011] Indexing splices
[Fri Jul 1 16:12:38 2011] Mapping left_kept_reads_seg1 against segment_juncs with Bowtie (1/2)
[Fri Jul 1 16:19:32 2011] Mapping left_kept_reads_seg2 against segment_juncs with Bowtie (2/2)
[Fri Jul 1 16:25:30 2011] Joining segment hits
[Fri Jul 1 16:37:26 2011] Reporting output tracks
-----------------------------------------------
Run complete [05:49:39 elapsed]
long_spanning_reads v1.3.0 (2398:2399)
--------------------------------------------
Opening Col_tophat-out/tmp/left_kept_reads_seg1.bwtout.z for reading
Opening Col_tophat-out/tmp/left_kept_reads_seg2.bwtout.z for reading
Opening Col_tophat-out/tmp/left_kept_reads_seg1.to_spliced.bwtout.z for reading
Opening Col_tophat-out/tmp/left_kept_reads_seg2.to_spliced.bwtout.z for reading
Loading reference sequences...
reference sequences loaded.
Loading spliced hits...done
Loading junctions...done
Loading deletions...done
Warning: malformed closure
Warning: malformed closure
Warning: malformed closure
Warning: malformed closure
[...]
--------------------------------------------
Opening Col_tophat-out/tmp/left_kept_reads_seg1.bwtout.z for reading
Opening Col_tophat-out/tmp/left_kept_reads_seg2.bwtout.z for reading
Opening Col_tophat-out/tmp/left_kept_reads_seg1.to_spliced.bwtout.z for reading
Opening Col_tophat-out/tmp/left_kept_reads_seg2.to_spliced.bwtout.z for reading
Loading reference sequences...
reference sequences loaded.
Loading spliced hits...done
Loading junctions...done
Loading deletions...done
Warning: malformed closure
Warning: malformed closure
Warning: malformed closure
Warning: malformed closure
[...]
Can anyone provide some pointers about what's going wrong? Am I making a simple, silly mistake or is there something fundamental I'm misunderstanding?
I thought it would be fine to do these test runs on my desktop (Core2 Duo 3GHz with 4GB RAM); could that be causing trouble (eg, not enough memory)?
Thanks for any help, guidance or suggestions!
Comment