I'm currently running a comparison between TopHat and other methods for finding splicing patterns in RNA Seq data. I've got about 250 million 32-nt reads from Arabidopsis. For the genes I am interested in, using other methods I have been able to align reads across many known and novel splice junctions at 100% ID that are unique to these locations (i.e., no multi-hits), with at least 8nt on either side of each junction (anchor regions). However, I am unable to get TopHat to find these junctions.
These are the TopHat parameters I'm using:
-g 1 # report only unique hits
-F 0.01 # report even poorly-represented junctions
--segment-mismatches=0 # enforce 100% ID for all reads
--splice-mismatches=0 # enforce 100% ID across junctions
--min-coverage-intron=10 # minimum allowed intron size for Arabidopsis
--max-coverage-intron=11000 # maximum intron size found in Arabidopsis
-i 10 # (as above)
-I 11000 # (as above)
--min-segment-intron=10 # (as above)
--max-segment-intron=11000 # (as above)
-j TAIR9_GFF3_genes.juncs # pre-processed splice junctions from gene model
-a 8 # minimum overlap/anchor of 8nt
-p 4 # allowing up to 4 threads on 8-processor machines
I can relax the constraint on splice mismatches to see if it helps, but ultimately I would like TopHat to find the junctions at 100% ID that I've already seen using other methods. I would like to make my comparison as fair as possible.
Any ideas? Am I misinterpreting/misusing any of these parameters?
These are the TopHat parameters I'm using:
-g 1 # report only unique hits
-F 0.01 # report even poorly-represented junctions
--segment-mismatches=0 # enforce 100% ID for all reads
--splice-mismatches=0 # enforce 100% ID across junctions
--min-coverage-intron=10 # minimum allowed intron size for Arabidopsis
--max-coverage-intron=11000 # maximum intron size found in Arabidopsis
-i 10 # (as above)
-I 11000 # (as above)
--min-segment-intron=10 # (as above)
--max-segment-intron=11000 # (as above)
-j TAIR9_GFF3_genes.juncs # pre-processed splice junctions from gene model
-a 8 # minimum overlap/anchor of 8nt
-p 4 # allowing up to 4 threads on 8-processor machines
I can relax the constraint on splice mismatches to see if it helps, but ultimately I would like TopHat to find the junctions at 100% ID that I've already seen using other methods. I would like to make my comparison as fair as possible.
Any ideas? Am I misinterpreting/misusing any of these parameters?
Comment