Seqanswers Leaderboard Ad

**flxlex** · 10-13-2010, 10:51 PM

Based on the fact that you state that the 454NewblerMetrics file reports no contigs, I would assume there is something that prevents assembly (or is this mapping?). Any contigs in the ace file are probably artifacts.

What kind of project/sample/reads do you have?

The '*' symbols represent gaps introduced to optimize the alignment, and 454 ace files have tons of them due to the homopolymer errors (or rather, variation in homopolymer length between reads). I would not deem the amount of these '*' a measure of quality. If you want to know the effect of using fasta input over sff, I would take reads from a known genome and check the correctness of the contigs relative to the reference sequence. Although I expect sff input to give better results, you never know...

If somebody has done this already, let me know :-)

**nicolallias** · 10-14-2010, 03:40 AM

Hi flxlex,
I'm considering de novo assemblies only, these observations are based on the study of dozens of assemblies done with Newbler 2.3 and this problem occurs:
- on gDNA from bacteria
- on gDNA from eukarya
- on cDNA (option -cdna) from plantae
But I all cases have some example cons.

Using the same raw datas we obtained different behaviour from newbler.
If we provide a sff file, contigs will be mainly composed by '*' what is not true if we basicly convert this file into fasta and assmble it with newbler. If doing so the asmmbly will look just fine.
Where could the behaviour difference could come from ?

Still investigate...

**flxlex** · 10-16-2010, 06:26 AM

Originally posted by nicolallias View Post

Hi flxlex,
I'm considering de novo assemblies only, these observations are based on the study of dozens of assemblies done with Newbler 2.3 and this problem occurs:
- on gDNA from bacteria
- on gDNA from eukarya
- on cDNA (option -cdna) from plantae
But I all cases have some example cons.

I don't understand, you get empty fna files and ace files with many '*' contigs for all these assemblies?

Using the same raw datas we obtained different behaviour from newbler.
If we provide a sff file, contigs will be mainly composed by '*' what is not true if we basicly convert this file into fasta and assmble it with newbler. If doing so the asmmbly will look just fine.
Where could the behaviour difference could come from ?

I am not sure, but what I would be more interested in is how correct the contigs are between assemblies using sff and fasta files. if you do this for a known genome (e.g. E. coli) what do you find?

**nicolallias** · 10-18-2010, 01:21 AM

I don't understand, you get empty fna files and ace files with many '*' contigs for all these assemblies?

Yes

I am not sure, but what I would be more interested in is how correct the contigs are between assemblies using sff and fasta files. if you do this for a known genome (e.g. E. coli) what do you find?

I will do that.

Edit: Both alignments on sff and fasta have been proceeded on E.coli, both gave ace and fna files full of contigs. But I won't look after the best method (fasta or sff) for alignment : the main question here is why ?
Sometimes we have nothing (ace fulled of "*", fna files empty) with the sff while we have something (ace and fna files have contigs) when starting with a fasta ?

We're getting contact with Roche...

**nicolallias** · 10-22-2010, 01:45 AM

The problem seems to be the sff file, software developers at 454 are working on this.
Hope some good news soon ;-)

**nicolallias** · 10-29-2010, 12:16 AM

Some news : the matter seems to be located on the sff file generation...

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

gsAssembly (Newbler) de novo behaviour, inputs and outputs

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News