Hi folks,
I am doing RNA-seq analysis with tophat protocol. Usually a genome reference in fasta and an annotation reference in GTF are needed for this type of analysis.
There are a few sources for Genome reference in fasta . But I found the source for GTF is either UCSC(table genome viewer to create one) or ensemble(ready to use). Below are my two questions(confusions):
1. GTF files from UCSC are much smaller than the one from ensemble. (for all the species I looked, mm, hs, rn)
2. In a given set of RNA-seq dataUse the different GTF as reference(from UCSC or from ensemble) I got different results. The acceptedhit bam files with GTF from ensemble are much smaller than the ones with UCSC GTF files as reference. Yes , with a much bigger reference I got less results according to the size of acceptedhit bam file. BY THE WAY: I notice GTF file from ensemble is not compatible with the genome reference from UCSC. So I used ensemble GTF files with ensemble genome reference. The tophat command ended without errors.
Does any body has idea about this ?
I am doing RNA-seq analysis with tophat protocol. Usually a genome reference in fasta and an annotation reference in GTF are needed for this type of analysis.
There are a few sources for Genome reference in fasta . But I found the source for GTF is either UCSC(table genome viewer to create one) or ensemble(ready to use). Below are my two questions(confusions):
1. GTF files from UCSC are much smaller than the one from ensemble. (for all the species I looked, mm, hs, rn)
2. In a given set of RNA-seq dataUse the different GTF as reference(from UCSC or from ensemble) I got different results. The acceptedhit bam files with GTF from ensemble are much smaller than the ones with UCSC GTF files as reference. Yes , with a much bigger reference I got less results according to the size of acceptedhit bam file. BY THE WAY: I notice GTF file from ensemble is not compatible with the genome reference from UCSC. So I used ensemble GTF files with ensemble genome reference. The tophat command ended without errors.
Does any body has idea about this ?