Unconfigured Ad

**GiladZil** · 07-08-2011, 12:14 AM

Same problem - empty .bed files

Hey,
I have the exact same problem and data type as thurisaz's, except I got no warnings or error messages whatsoever.
According to 'bowtie.left_kept_reads.fixmap.log' only 10% of the reads were aligned!!
I should note that when I ran Tophat 1.0.12 on a sample dataset I did get meaningful .bed files, but there I had another problem when I ran it on the full dataset, as the temporary file: 'left_kept_reads.fq.candidate_hits.sam' became huge (>360GB) and the program was terminated by the server before it finished running.
The Bowtie version was the same for both runs ( 0.12.3).
I couldn't find significant changes between v1.0.12 and v1.3.0 regarding short reads handling. Were ther any?

Please help us solve this, it's quite depressing coming home to an empty .bed...

**kathi** · 07-12-2011, 03:28 AM

Tophat 1.3.0 - Warning: malformed closure - hardly any junctions found

Hi all,

I encounter a similar problem when running Tophat 1.3.0 on Illumina sequencing data (75nt PE).

The dataset has ~30 mio reads and Tophat 1.1.4 (--num-threads 8 --mate-inner-dist 200 --solexa-quals --min-isoform-fraction 0 --coverage-search --segment-mismatches 1) predicts almost 9 mio junction-spanning reads among them.

The same input file and the same Tophat parameters used with Tophat 1.3.0 finds only ~1000 junction-spanning reads, changing the parameters makes the number drop down to 0.

Like reported by thurisaz, Tophat seems to run fine, but long_spanning_reads.log contains many many lines of Warning: malformed closure.

I would be very happy for any explanations why the junction reads are lost or suggestions how to solve the problem.

Thanks a lot,
Kathi

**thurisaz** · 07-14-2011, 08:01 PM

Following kathi's post, I decided to try running the analysis with an old version of tophat and found the same thing. I ran TopHat 1.3.1 and 1.1.4 on paired end SOLid (50+25bp, about 36m reads) with a GTF file and exactly the same options (--coverage-search -r 131 --library-type fr-secondstrand -F 0.01 -p 16 -i 5 -I 6000 --segment-length 17 --segment-mismatches 1 -m 1 -CQ -a 4).

TopHat 1.3.1:
- empty junctions.bed
- "Found 0 junctions from happy spliced reads" in reports.log
- Reports 41% of F3 reads "with more than one alignment" in bowtie.left_kept_reads.fixmap.log and then 4%,4% & 19% in the _seg* logs
- F5 reads are 40% aligned, followed by 7 & 1.6% for the segments

TopHat 1.1.4:
- junctions.bed is not empty
- reports.log says "Found 26222 junctions from happy spliced reads"
- F3 reads map 61% initially and F5 reads 40%. I'm not sure what's going on the with segments because the filenames look random and there seem to be two log files per segment with different percentages. My impression, though, is that the values look better (getting as high as 80-90%).

These runs were done back-to-back on the same machine using the same data and GTF file; the only thing that changed is the version of Tophat. I'm a bit surprised to see such drastic differences, particularly since I thought v.1.3 is supposed to be better at handling SOLiD. Is there some good reason for this which I'm missing? Should I be doing something different with v. 1.3.1?

I'm going to try 1.2.0 now and see how it behaves; hopefully I'll report back about that later today. In the meantime, any suggestions or insight would be greatly appreciated.

**thurisaz** · 07-15-2011, 10:38 PM

1.2.0 also didn't work well for me; it only found 20 junctions when run with precisely the same options. Based on the .bam file, it also seems to have aligned far fewer reads overall (9m vs. 15m & 19m for 1.3.1 & 1.1.4, respectively), which I find a bit strange. I guess it's possible that something odd happened during the run, but I'm not planning to troubleshoot it; I think I'll just use 1.1.4 for now.

**thurisaz** · 07-15-2011, 10:43 PM

One more thing -- in order to get tophat 1.1.4 to run without errors on my colorspace data, I had to replace line 1511 of the tophat script:

Code:

decode_dic = { 'A0':'A', 'A1':'C', 'A2':'G', 'A3':'T', 'A4':'N', 'A.':'N',                                                    
                  'C0':'C', 'C1':'A', 'C2':'T', 'C3':'G', 'C4':'N', 'C.':'N',                                                
                  'G0':'G', 'G1':'T', 'G2':'A', 'G3':'C', 'G4':'N', 'G.':'N',                                                
                  'T0':'T', 'T1':'G', 'T2':'C', 'T3':'A', 'T4':'N', 'T.':'N',                                                
                  'N0':'N', 'N1':'N', 'N2':'N', 'N3':'N', 'N4':'N', 'N.':'N' }

with the corresponding code from v. 1.3.1:

Code:

decode_dic = { 'A0':'A', 'A1':'C', 'A2':'G', 'A3':'T', 'A4':'N', 'A.':'N', 'AN':'N',                                           
         'C0':'C', 'C1':'A', 'C2':'T', 'C3':'G', 'C4':'N', 'C.':'N', 'CN':'N',                                           
         'G0':'G', 'G1':'T', 'G2':'A', 'G3':'C', 'G4':'N', 'G.':'N', 'GN':'N',                                           
         'T0':'T', 'T1':'G', 'T2':'C', 'T3':'A', 'T4':'N', 'T.':'N', 'TN':'N',                                           
         'N0':'N', 'N1':'N', 'N2':'N', 'N3':'N', 'N4':'N', 'N.':'N', 'NN':'N',                                           
        '.0':'N', '.1':'N', '.2':'N', '.3':'N', '.4':'N', '..':'N', '.N':'N' }

**biznatch** · 12-01-2011, 12:13 PM

I am also getting lots of "Warning: malformed closure" warnings, does anyone know what this means?

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Trouble getting TopHat to work -- empty junctions.bed

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News