Originally posted by dschika
View Post
On the other end of the spectrum, for very low coverage transcripts, you may find these split into multiple isogroups. Without enough reads to sufficiently cover the length of the transcript they may not all be linked together into a single isogroup. You may end up with 2 or more isogroups (and hence [con/iso]tigs) which represent different regions of the same transcript.
These problems are not at all unique to the gsAssembler; similar problems occurred when I used the TGICL pipeline for assembling 454 cDNA reads. The simple truth is that de novo assembly of transcriptomes is very, very hard, in some ways harder than genomes. There is no perfect assembler or optimal set of parameters which will take your reads and spit out a perfect set of transcript sequences. (And no matter how many times I tell the researchers I work with they still don't seem to believe me!)
Leave a comment: