Hello,
I have about 950 million reads from an RNA-Seq data set that covers many developmental time-points. Assembling all the reads doesn't really work because I reach a point where errors are being included at a higher rate than new k-mers (or so I have been advised...including all of the reads and digital-normalizing to 20x results in a very fragmented, low quality assembly).
If I assemble multiple time-points individually and then merge the transcriptomes, how would I select the best representative isoform from each assembly and jettison the rest to create a nice, clean final reference? What is the a good method to filter the garbage out and what is a good method merge them, favoring more complete sequences?
To clarify merging - I'm thinking of selecting individual transcripts from multiple assemblies, not merging actual sequences together to increase length, although that would be a source of improvement.
I have about 950 million reads from an RNA-Seq data set that covers many developmental time-points. Assembling all the reads doesn't really work because I reach a point where errors are being included at a higher rate than new k-mers (or so I have been advised...including all of the reads and digital-normalizing to 20x results in a very fragmented, low quality assembly).
If I assemble multiple time-points individually and then merge the transcriptomes, how would I select the best representative isoform from each assembly and jettison the rest to create a nice, clean final reference? What is the a good method to filter the garbage out and what is a good method merge them, favoring more complete sequences?
To clarify merging - I'm thinking of selecting individual transcripts from multiple assemblies, not merging actual sequences together to increase length, although that would be a source of improvement.
Comment