Hello Everyone,
I am trying to use TopHat's GFF feature so that genes are normalized with RPKM. However, I cannot seem to get that aspect of the program working. If I include this -G/GFF option then TopHat will not report junctions.bed, or islands.gff or islands.bed. I am using test data to make sure everything is working properly before I invest the time in running real solexa runs. If anyone has any suggestions as to how to get GFF function in tophat it would be greatly appreciated.
Here is Sample Reads:
@SRR013411.1874 :1:1:838:688 length=50
aaaaagtcacagtctgtctgtctgtctctctcctctaatcttttttatcc
+SRR013411.1874 :1:1:838:688 length=50
I>AI8I6II1BI,EIB4+0;51-.7'*,#(+&((*#'&&#"++&"!!*%$
@SRR013411.1875 :1:1:83:374 length=50
gctagggttttgaagcaaggtttctcgcgattttctcgatatctctcgcc
+SRR013411.1875 :1:1:83:374 length=50
IIIGDGIIIII@DIIII3I/ICI:412B;=:5/+2/&503-39&2#1D,(
@SRR013411.1876 :1:1:44:341 length=50
cATGGCGAAACCAAGTCGTGGCCGTCGTTCCCCCTCCGTGTCTGGCTCGT
+SRR013411.1876 :1:1:44:341 length=50
22*4$'3%+1#)41&2%2$$%)#8'%#$&%(+%$'&!,"+!$*&,&!%%&
@SRR013411.1877 :1:1:557:896 length=50
CATCTCGTTCCAGTTCCAGATCTCGTTCGGGTTCGAGCCCCTCCAGGTCT
+SRR013411.1877 :1:1:557:896 length=50
I1II+GI8CI86I%A-I89;=40+-)A0.%)+.&&)(-&0%$!"!$#("%
@SRR013411.1878 :1:1:866:720 length=50
ATTTCCCGCTCACGCTCCCGTTCTAGATCGCTCTCTTCATCTTCATCTCC
+SRR013411.1878 :1:1:866:720 length=50
8"(-,<=(-+&8'&%*&-')"$)%"&"#%!!$)#$!!$!"""!!"%!!""
Here is Sample GFF:
chr1 TAIR8 chromosome 1 30432563 . . . ID=chr1;Name=chr1
chr1 TAIR8 gene 3631 5899 . + . ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
chr1 TAIR8 mRNA 3631 5899 . + . ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
chr1 TAIR8 protein 3760 5630 . + . ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1
chr1 TAIR8 exon 3631 3913 . + . Parent=AT1G01010.1
chr1 TAIR8 five_prime_UTR 3631 3759 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 3996 4276 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 4486 4605 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 4486 4605 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 4706 5095 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 4706 5095 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 5174 5326 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 5174 5326 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 5439 5899 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 5439 5630 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 three_prime_UTR 5631 5899 . + . Parent=AT1G01010.1
Thanks for any help or suggestions.
cheers
I am trying to use TopHat's GFF feature so that genes are normalized with RPKM. However, I cannot seem to get that aspect of the program working. If I include this -G/GFF option then TopHat will not report junctions.bed, or islands.gff or islands.bed. I am using test data to make sure everything is working properly before I invest the time in running real solexa runs. If anyone has any suggestions as to how to get GFF function in tophat it would be greatly appreciated.
Here is Sample Reads:
@SRR013411.1874 :1:1:838:688 length=50
aaaaagtcacagtctgtctgtctgtctctctcctctaatcttttttatcc
+SRR013411.1874 :1:1:838:688 length=50
I>AI8I6II1BI,EIB4+0;51-.7'*,#(+&((*#'&&#"++&"!!*%$
@SRR013411.1875 :1:1:83:374 length=50
gctagggttttgaagcaaggtttctcgcgattttctcgatatctctcgcc
+SRR013411.1875 :1:1:83:374 length=50
IIIGDGIIIII@DIIII3I/ICI:412B;=:5/+2/&503-39&2#1D,(
@SRR013411.1876 :1:1:44:341 length=50
cATGGCGAAACCAAGTCGTGGCCGTCGTTCCCCCTCCGTGTCTGGCTCGT
+SRR013411.1876 :1:1:44:341 length=50
22*4$'3%+1#)41&2%2$$%)#8'%#$&%(+%$'&!,"+!$*&,&!%%&
@SRR013411.1877 :1:1:557:896 length=50
CATCTCGTTCCAGTTCCAGATCTCGTTCGGGTTCGAGCCCCTCCAGGTCT
+SRR013411.1877 :1:1:557:896 length=50
I1II+GI8CI86I%A-I89;=40+-)A0.%)+.&&)(-&0%$!"!$#("%
@SRR013411.1878 :1:1:866:720 length=50
ATTTCCCGCTCACGCTCCCGTTCTAGATCGCTCTCTTCATCTTCATCTCC
+SRR013411.1878 :1:1:866:720 length=50
8"(-,<=(-+&8'&%*&-')"$)%"&"#%!!$)#$!!$!"""!!"%!!""
Here is Sample GFF:
chr1 TAIR8 chromosome 1 30432563 . . . ID=chr1;Name=chr1
chr1 TAIR8 gene 3631 5899 . + . ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
chr1 TAIR8 mRNA 3631 5899 . + . ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
chr1 TAIR8 protein 3760 5630 . + . ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1
chr1 TAIR8 exon 3631 3913 . + . Parent=AT1G01010.1
chr1 TAIR8 five_prime_UTR 3631 3759 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 3996 4276 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 4486 4605 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 4486 4605 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 4706 5095 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 4706 5095 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 5174 5326 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 5174 5326 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 exon 5439 5899 . + . Parent=AT1G01010.1
chr1 TAIR8 cDS 5439 5630 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
chr1 TAIR8 three_prime_UTR 5631 5899 . + . Parent=AT1G01010.1
Thanks for any help or suggestions.
cheers
Comment