Unconfigured Ad

**flxlex** · 08-01-2012, 02:43 AM

Originally posted by JamesH View Post

1. In the PacBio dataset the correction was processor intensive, but what do long reads mean for the memory requirements of de-novo assemblers? If you have very long reads does the algorithmic problem become more manageable without the need for 128-256GB of RAM?

With long, high quality reads, you need much less of them to have enough coverage for consensus calling. Less reads means less overlaps. So, assembly should take less memory and go faster.

2. Is anyone working on assemblers that will achieve this under the assumption that longer reads are inevitable, or will current tools work with minor modifications?

It would be great if people were investing in long-read assemblers already now, but I think this is a bit premature as of today.

3. I'm kind of indirectly interested in regions that have a bit of transposable action, and repetitive regions more generally. If a lot of the missing data in current assemblies is due to these two factors then what length of good quality read would be likely to resolve the majority of them?

Repeats can be resolved if there are enough reads that span them (i.e. are long enough to have flanking sequence). So, as usual, this is a species specific aspect (some species have very long repeats, in the 2-4 kb range).

Hope this helps!

**pmiguel** · 08-01-2012, 05:23 AM

Originally posted by flxlex View Post

(some species have very long repeats, in the 2-4 kb range).

Worse than that.
Maize has several 8-15 kb LTR-retrotransposon families with copy numbers >5000 and more than half of its total genome comprising this type of element. Maize is not unusual in this regard -- seems to be a common feature of most plant genomes with genome sizes > 2 gigabases or so. And below that genome size, LTR-retrotransposons are nevertheless major players, but their copy number maxima may drop into the hundreds.

--
Phillip

**krobison** · 08-01-2012, 06:57 AM

To some degree long reads are "back to the future" -- overlap-layout-consensus assemblers developed for Sanger data do well with long reads, such as MIRA and Celera Assembler (a upcoming renaissance for Phrap?)

It would appear that one of the issues addressed by string graph assemblers is long reads, though I won't claim any expert understanding in this area.

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, Today, 10:26 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Today, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

Assembly of long reads

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News