Unconfigured Ad

**andpet** · 12-23-2008, 12:53 AM

newbler hangs

Hi,

I think you should wait (maybe for a week or so).

First:
1.7 M reads really are a lot of data and therefore the denovo assembly can take quite some time. For example for some assemblies I waited at least one week !!.

Maybe you can use a faster computer ?

Second:
Is the genome you sequenced highly repetitive ? In this case it will take even longer. In your log you can see that newbler starts with looking for pairwise read overlaps. Next it will build contigs from these overlaps. This is the "detangling" phase since newbler tries to resolve repeats (due to repeats several reads overlap in many ways but only one is correct) and this is really time consuming. Another problem is that newbler needs for this step a lot of RAM. If you don't have enough the operating system will try to provide some virtual memory (memory on the hard disk) but using virtual memory is much slower then using RAM. This would slow down your process additionally.

The more RAM the better ... :-)

You could also use another assembler for example euler to get some larger contigs and then assemble them with newbler. Or mira ...

By the way: In your newbler assembly directory there is a file 454NewblerProgress.txt where newbler reports every step (unfortunately without a run time or so) ...

Cheers,

Andreas

**jnfass** · 12-23-2008, 10:25 AM

Thanks Andreas! ... my run is finally in the "Building contigs/scaffolds" stage, so I guess I sounded the alarm too soon. The run's not RAM-limited, and it's running on a 2.8GHz processor, but I haven't looked very much at repeat content ... thanks for the suggestion. Does anyone know if newbler's going to become multi-threaded any time soon?

**hlu** · 01-02-2009, 02:35 PM

Hi Joe,

I saw in another forum mentioning the sample is plants

Your difficulty on assemling plants 454 data is expected. Plant sequences are highly repetitive. The 454 gsAssembly running time is porportional to the degree of repeats in the data set. Typically, for bacterial data of your size, it takes only couple of hours to finish. But for plants, it can go on to several days, or not finishing at all, and our of memory crash.

**westerman** · 01-06-2009, 10:39 AM

Another problem, although it probably is not the root cause, is mixing Titanium and non-titanium runs and software. I found that I had to specify the proper adapters via the '-v' option when mixing the two.

The repetitive nature of plants is mostly likely your root cause.

**jnfass** · 01-06-2009, 12:01 PM

@westerman -
thanks for the tip ... may well be a future concern, but not with this data set. I'm working on setting aside the reads with repeat content (or masking) and will try to post back here to confirm or challenge the repeat cause.

But I have another concern about newbler that I'll post in the "de novo discovery" forum .. having to do with newbler apparently padding and offsetting (instead of aligning) SNPs ...

**hlu** · 01-06-2009, 01:06 PM

Originally posted by westerman View Post

Another problem, although it probably is not the root cause, is mixing Titanium and non-titanium runs and software. I found that I had to specify the proper adapters via the '-v' option when mixing the two.

The repetitive nature of plants is mostly likely your root cause.

-v is vector trimming feature under gsAssembly (or gsMapper).

Titanium is very long reads, some of which may contain adaptor sequence at tail portion of reads. -v will trim that in assembly or mapping.

Usually this is not cause for speed slow down. But in samples where customized primers are dominant, primer sequences can slow down assembly dramatically. -v option can solve this problem by trimming off primers in assembly.

Topics	Statistics	Last Post
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, Yesterday, 10:04 AM	0 responses 8 views 0 reactions	Last Post by SEQadmin2 Yesterday, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM

Unconfigured Ad

gsAssembler / newbler hangs during (large?) assembly

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News