Originally posted by jimmybee
View Post
I am familiar with wheat and barley genome structure. Yes, a nightmare that Illumina is not going to be able to traverse de novo. But I still don't see how increasing your read length 10x helps much.
Unlike animal genomes, plants tend to have compact genes, with introns only rarely longer than a kb or two. So you could sequence genes de novo and attempt to string the genes together in their correct order/orientation using mate ends. Won't work in many instances because even a moderately-sized retro cluster will create a repetitive block large enough that even 20 kb mate ends will not be able to bridge[1].
Point being whether you have 100 bp or 700 bp reads, you still have the same basic limitations retro-cluster-wise. So you might as well go with the much lower cost and higher accuracy of Illumina sequence.
Small (non-autonomous) transposable elements may not be traverse-able in a single read at 100 bp, but with Illumina paired-end reads to anchor your middle repetitive TE read with another low copy read in the surrounding sequence, you are back in the game with a little gap filling informatics. And, even without that, you should be able to span these small elements with mate end reads to create scaffolds.
So the question remains, what are 700 bp reads giving you that makes them worth their 100x cost vs paired 100 bp reads?
--
Phillip
[1]Well, probably. I could envision a method using pretty high coverage 20 kb mate ends. So maybe using a cosmid-based mate end library construction method, you could do it. The idea would be to first determine the LTR structure of all high-copy LTR retrotranspons in the genome, then find the unique junctions created by the insertion of elements into one another. These junctions then serve as the "stepping stones" across the retro cluster swamp.
Leave a comment: