The problem:
I am trying to assemble the transcriptome of Belgica the antarctic midge. Some of our assembled transcripts are much larger than they should be, and they contain multiple genes . In the ones that I have dug into in depth, these genes are adjacent on the chromosomes of the draft genome. We have no reason to believe that these are biological in origin.
The situation:
I am working with three lanes of RNA-Seq data from a solexa machine. We have about 35 million paired end reads (70M total). Each read is 76 bp. At the same time we have a genome assembly project that is not under my direct control. It is against this draft genome that I am attempting to assemble the transcriptome with tophat/cufflinks.
Has anybody run into this before. I have been pulling my hair out trying to tweak the input parameters of tophat and cufflinks in order to eliminate this. I have also tried filtering my reads. I accepted sequences where 75% of the sequence had a phred score of 38 or better. Both mates had to pass in order to be included.
I have found one or two threads on various forums and they were not very helpful. The most helpful idea anyone had was to contact the authors. I tried that, but I am not holding my breath. Their automated message (below) openly stated that they may not contact me back. nice. I really appreciate any help you guy and gals can give me.
Dear Tophat/Cufflinks User,
Your message has been received and will be forwarded to the appropriate project members. Due to the large numbers of e-mails we receive, a response may not be immediate. We focus first on high priority bug reports before answering general questions and sometimes do not respond to repeat bug reports when a fix is already in the works. In the meantime, please have a look at the links below, which may aid in answering your questions.
TopHat Manual: http://tophat.cbcb.umd.edu/
Cufflinks Manual: http://cufflinks.cbcb.umd.edu/
SeqAnswers Forum: http://seqanswers.com/
Regards,
The TopHat and Cufflinks Teams
I am trying to assemble the transcriptome of Belgica the antarctic midge. Some of our assembled transcripts are much larger than they should be, and they contain multiple genes . In the ones that I have dug into in depth, these genes are adjacent on the chromosomes of the draft genome. We have no reason to believe that these are biological in origin.
The situation:
I am working with three lanes of RNA-Seq data from a solexa machine. We have about 35 million paired end reads (70M total). Each read is 76 bp. At the same time we have a genome assembly project that is not under my direct control. It is against this draft genome that I am attempting to assemble the transcriptome with tophat/cufflinks.
Has anybody run into this before. I have been pulling my hair out trying to tweak the input parameters of tophat and cufflinks in order to eliminate this. I have also tried filtering my reads. I accepted sequences where 75% of the sequence had a phred score of 38 or better. Both mates had to pass in order to be included.
I have found one or two threads on various forums and they were not very helpful. The most helpful idea anyone had was to contact the authors. I tried that, but I am not holding my breath. Their automated message (below) openly stated that they may not contact me back. nice. I really appreciate any help you guy and gals can give me.
Dear Tophat/Cufflinks User,
Your message has been received and will be forwarded to the appropriate project members. Due to the large numbers of e-mails we receive, a response may not be immediate. We focus first on high priority bug reports before answering general questions and sometimes do not respond to repeat bug reports when a fix is already in the works. In the meantime, please have a look at the links below, which may aid in answering your questions.
TopHat Manual: http://tophat.cbcb.umd.edu/
Cufflinks Manual: http://cufflinks.cbcb.umd.edu/
SeqAnswers Forum: http://seqanswers.com/
Regards,
The TopHat and Cufflinks Teams