I am a tophat user. I am now using tophat v1.3.2. My data is 2x75bp human rna-seq data.
I find that when I use --GTF option against ensembl gene set (release 64), the number of spliced hits (with N in cigar field) are less than that without using --GTF option. The number of spliced hits is as follows.
75bp_no_gtf 75bp_gtf
R1 3736037 3375768
R2 4322737 3921965
R3 4142597 3873519
My command is listed below.
tophat --mate-inner-dist 30 --bowtie-n --GTF ensemble_64.gtf --library-type fr-unstranded -o Sample_BG-R-1_after_filtering bowtie_index/hg19/hg19 r-1_R1.fq r1_R2.fq
my question is :
1. why --GTF option will decrease the spliced hits number?
2. why the number of non-spliced reads will also change when using --GTF option?
Thanks very much and look forward to your response.
I find that when I use --GTF option against ensembl gene set (release 64), the number of spliced hits (with N in cigar field) are less than that without using --GTF option. The number of spliced hits is as follows.
75bp_no_gtf 75bp_gtf
R1 3736037 3375768
R2 4322737 3921965
R3 4142597 3873519
My command is listed below.
tophat --mate-inner-dist 30 --bowtie-n --GTF ensemble_64.gtf --library-type fr-unstranded -o Sample_BG-R-1_after_filtering bowtie_index/hg19/hg19 r-1_R1.fq r1_R2.fq
my question is :
1. why --GTF option will decrease the spliced hits number?
2. why the number of non-spliced reads will also change when using --GTF option?
Thanks very much and look forward to your response.