Hi
I am new to snpEff and have troubles with the snpEff reference genome database for tomato (SL2.40).
When I annotated my SNPs using the SL2.40 reference database available at the snpEff site at sourceforge.net, I realized that this database (at least as far as I could see) contains only 2474 genes instead of ~ 40,000. So I tried to make my own tomato database. After correcting manually some mistakes in the GFF file of the tomato genome obtained from SGN, I constructed a reference database according to the indications of the snpEff manual.
At the end of the database establishment I got the following message:
#Has protein coding info: false (I did not provide any protein file, which should be OK)
#Genes: 43015
#protein coding genes: 0
#Transcripts: 40097
#Protein coding transcripts: 0
#Cds: 138926
#Exons: 141585
#Exons with sequence: 141453
#Exons without sequence: 134
Nevertheless the database seems to contain only some 3000 genes. Is there anybody out there who has an idea what I did wrong?
By the way, the reference genome and gff files used for database construction work OK in a genome browser.
I would be very thankful for any advise how I could come up with a database that would be as complete as possible. Thanks,
RS
I am new to snpEff and have troubles with the snpEff reference genome database for tomato (SL2.40).
When I annotated my SNPs using the SL2.40 reference database available at the snpEff site at sourceforge.net, I realized that this database (at least as far as I could see) contains only 2474 genes instead of ~ 40,000. So I tried to make my own tomato database. After correcting manually some mistakes in the GFF file of the tomato genome obtained from SGN, I constructed a reference database according to the indications of the snpEff manual.
At the end of the database establishment I got the following message:
#Has protein coding info: false (I did not provide any protein file, which should be OK)
#Genes: 43015
#protein coding genes: 0
#Transcripts: 40097
#Protein coding transcripts: 0
#Cds: 138926
#Exons: 141585
#Exons with sequence: 141453
#Exons without sequence: 134
Nevertheless the database seems to contain only some 3000 genes. Is there anybody out there who has an idea what I did wrong?
By the way, the reference genome and gff files used for database construction work OK in a genome browser.
I would be very thankful for any advise how I could come up with a database that would be as complete as possible. Thanks,
RS