Hi all,
I was hoping to get some feedback on an assembly strategy of repetitive dna using Illumina 100-PE data.
I've seen some strategies, such as ALL-PATHS-LG, which utilize multiple libraries of increasingly larger insert size to resolve repetitive regions. For example, assembly of the potato genome (which is ~60% repetitive) used 7 libraries ranging from an insert size of 200 bp - 20,000 bp.
We're running a pilot on a few samples to see how well the assembly will be, but because of coverage issues, will only be creating 2 libraries per sample.
So here is my question:
Is it better to create the libraries with insert sizes that are close in range (ex. 200 bp and 500 bp) or large in range (ex. 200 bp and 5,000 bp)? I can see pros and cons of both, but wanted elicit some advice before going forward.
Thanks,
John
I was hoping to get some feedback on an assembly strategy of repetitive dna using Illumina 100-PE data.
I've seen some strategies, such as ALL-PATHS-LG, which utilize multiple libraries of increasingly larger insert size to resolve repetitive regions. For example, assembly of the potato genome (which is ~60% repetitive) used 7 libraries ranging from an insert size of 200 bp - 20,000 bp.
We're running a pilot on a few samples to see how well the assembly will be, but because of coverage issues, will only be creating 2 libraries per sample.
So here is my question:
Is it better to create the libraries with insert sizes that are close in range (ex. 200 bp and 500 bp) or large in range (ex. 200 bp and 5,000 bp)? I can see pros and cons of both, but wanted elicit some advice before going forward.
Thanks,
John
Comment