Seqanswers Leaderboard Ad

**gen2prot** · 05-19-2010, 01:06 PM

Hello,

Tophat seems to have run properly, the error output has no warnings. However the accepted.sam file shows no hits. The junctions.bed file is 1.3 MB. Anyone faced a similar problem?

Thanks
Abhijit

**gen2prot** · 05-19-2010, 01:54 PM

Hello,

Following up on this, I wanted to know if indexing of all genes is possible or not in the first place. My genes.fasta file looks something like this.

Code:

>FBgn0034974 type=gene; loc=2R:19969255..19973683; ID=FBgn0034974; name=CG16786; dbxref=FlyBase:FBgn0034974,FlyBase:FBan0016786,FlyBase_Annotation_IDs:CG16786,GB:BI363616,GB:BT001664,GB_protein:AAN71419,GB_protein:AAF47161,GB_protein:AAM68305,UniProt/TrEMBL:Q7JRF0,INTERPRO:IPR011071,EntrezGene:37856,BIOGRID:63435,DroID:FBgn0034974,DRSC:FBgn0034974,FlyAtlas:CG16786-RA,flyexpress:FBgn0034974,FlyMine:FBgn0034974,GenomeRNAi_gene:37856,modMine:FBgn0034974; derived_computed_cyto=60B8-60B9%3B Limits computationally determined from genome sequence between @P{lacW}Phm<up>k07623</up>@%26@P{lacW}tsr<up>k05633</up>@ and @P{EP}EP503@; gbunit=AE013599; MD5=4a28df05c5f7a49b8fd75a28e3b5759e; length=4429; release=r5.27; species=Dmel; 
CGGATTCGGATTCAGATTCACATTCAGATTCAGATACGTTCGGTTTGGGA
TTCGGATTCATTCGTTGCCACTCCAGCTCTATGCTCCGCGTTGGACCCAC
CGATAGCTTGGCTTTCTGCTACAGTTTCATAATTGTCTCGGCCAGCAGCA
GCGGAGTTCATGATTTCGCTCGGAATATGTTTTAGCCAGATCAGTGCTTG
GAAAATGCACTTTTGAGCGTGTACGTGTATGTGGCAAGTAGCTGGCGAAC
GTGAATGAAAACATGAGCTGCCACTGAACGAAACCCACTCTCGAGCTGGA
AGTGCAAGTGAGTTATCCCGCGGAAGAAAAGAAACTGAATTGATTACCAT
TACCATTCGCGGAGTAGCAGTCTCGGAATTAAATACCAACGACCCAGACA
ATACCGAGCCCAGTTCCAAGCTGGAGGCTCAAGCCTTTCTCTATTCAATG

Do I need to re-import a modified fasta file which has a shorter head information? There seems to be a lot of characters in the header which I cannot understand.

thanks
Abhijit

**Thomas Doktor** · 05-20-2010, 02:26 AM

Hi,

You should build a new bowtie index of the Drosophila genome and not of the individual genes as TopHat is designed to align RNA-seq reads against a full genome. This might explain the behaviour of TopHat, although it should have aligned some reads after all. Perhaps the characters in the fasta headers are causing trouble or there are too many contigs for TopHat to handle well.

**gen2prot** · 05-20-2010, 07:05 AM

Hi Thomas,

I ran Tophat on the chromosomes and it works wonderfully. I think the fasta header might be the one to blame, since there are characters such as %@><{} etc. It appeared to me as some sort of construct info. Anyways I removed everything except the name of the sequence, and am building the gene index again. Lets see. However, the gene file that I am building the index from is 85 MB in size containing 14964 genes. You think this may cause a problem? Thanks for your help.

Abhijit

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Cannot understand Tophat output... Help!

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News