I've got ~30x coverage of a small < 100MB algal genome using PB RSII. I corrected, assembled and polished the genome with Canu, and was pretty pleased with the results until I blasted the genome into itself and found dozens and dozens of repeated DNA regions up to and gt 50kbp that occur in multiple contigs - usually at the ends but not always. The Canu developers helped tweak my run a bit, but the problem persisted. Recently I used the same workflow with a different alga and see the exact same problem, and have recently spoken to another lab (working on corals) with identical issues using SMRTmake (not sure if it was HGAP.3 or not). It has gotten so bad that I've found chloroplast fragments assembled in with the genomic DNA contigs. Has anyone else encountered this? My runs were all done on different instruments with different extraction protocols.. is the RSII creating chimeric reads? Thanks in advance.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
One way to circumvent this is with the overlap_filtering_setting in FALCON. This allows you to filter out "chimeric contigs" due to the fact that overlap coverage will differ across the contig. The coverage in repetitive regions will be much higher relative to everything else.
I'm not aware of a similar setting in canu
-
There is always a non zero chance or creating biological chimeras in sample prep, adapters are blunt end ligated to the sheared DNA therefore it is always possible that fragments ligate to one another before having adapters attached. Obviously the adapter concentration is optimized to minimize this and in general biological chimeras are extremely rare, but it is possible that mistakes in sample prep can results in much higher numbers.
Even if biological chimers do occur they are random so should not have support from other reads i.e. the first step of assembly corrects them. But in cases of bad sample prep it is possible that chimeras, due to their large number, pass correction and result in miss-assemblies. As pointed out in the above post preassembly can be parameterized to better handle high levels of biological chimeras, higher coverage requirement for correction, not using multiple subreads from the same molecule (not using -a in Falcon), but this will depend on the extent of the problem and assembler being used.
Comment
-
Thanks.. I have not yet tried Falcon. Maybe it's worth a shot. I think heterozygosity is a real problem for pacbio and I'm wondering if it is causing some of my issues. My samples are multiisolates and have not spent years in culture that would breed out variation. I dug up this thread:
Hi everyone ! I'm trying to use Canu in order to assemble the D. suzukii genome. As flies genome are genes dense (genes are very close to each others), and as the D. suzukii species contains a lot ...
That seems to mirror my issues as well. When I noticed this problem, my first thoughts were "this can't apply only to me" since it was present in every assembly we've made using RSII data regardless of covearge, but perhaps most other folks are using clonal lines or inbred populations.
Comment
-
Originally posted by k-gun12 View PostThanks.. I have not yet tried Falcon. Maybe it's worth a shot. I think heterozygosity is a real problem for pacbio and I'm wondering if it is causing some of my issues. My samples are multiisolates and have not spent years in culture that would breed out variation. I dug up this thread:
Hi everyone ! I'm trying to use Canu in order to assemble the D. suzukii genome. As flies genome are genes dense (genes are very close to each others), and as the D. suzukii species contains a lot ...
That seems to mirror my issues as well. When I noticed this problem, my first thoughts were "this can't apply only to me" since it was present in every assembly we've made using RSII data regardless of covearge, but perhaps most other folks are using clonal lines or inbred populations.
Comment
-
I agree, but Illumina sequencing of these same cultures would not exhibit this problem. Granted, the assembly was in thousands and thousands of contigs, but there was no redundancy and the gene predictions could be trusted. Right now, I'd rather have a fragmented assembly that accurately reflects copy number instead of what outwardly appears to be very large and duplicated gene families. I suppose it depends on where your priorities are.
Comment
-
It's always going to be difficult to assemble something that is highly heterozygous, if you have illumina data you may want to try http://www.genome.umd.edu/masurca.html there is some evidence that this approach better maintains the separation of haplotypes before overlap assembly.
Comment
-
I'm having a problem understanding why Illumina assembly wouldn't show the same problem. Is the assumption that areas of high heterozygosity simply get broken in the De Bruijn graph? At some point even with Illumina data you will assemble out different haplotypes, particularly in highly hetrozygous regions.
Why not just filter the pacbio contigs for consistent expected coverage of raw reads?
Comment
-
Originally posted by k-gun12 View PostI corrected, assembled and polished the genome with Canu, and was pretty pleased with the results until I blasted the genome into itself and found dozens and dozens of repeated DNA regions up to and gt 50kbp that occur in multiple contigs - usually at the ends but not always.
Originally posted by k-gun12 View PostIt has gotten so bad that I've found chloroplast fragments assembled in with the genomic DNA contigs. Has anyone else encountered this?
I assumed that PBjelly was mis-placing an LTRtransposon or other repetitive sequence.
How did you work this out in the end?
Comment
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:55 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
215 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
Comment