Seqanswers Leaderboard Ad

**rwenang** · 01-30-2011, 07:15 PM

Try trans-abyss or oases. They are more specialized in assembling transcriptome compared to genome assembler (SOAP de novo, velvet, abyss).

**Niharika** · 01-30-2011, 07:55 PM

Thank you rwenang.

Can anybody further tell me how to set K-mer lengths for denovo transcriptome assembly and regarding calculation of N50.

**samanta** · 01-31-2011, 10:43 AM

Hello Niharika,

I have been doing something similar with paired end Solexa data (75 nt x2). We are using oases, which is part of velvet pipeline. This is what you need to do - (i) do an assembly using velvet and keep read tracking option on, (ii) run oases on the velvet result for transcriptome assembly. These are all explained in oases manual.

For my data, I played with few different K-mer lengths and settled on K=21 for best N50. You also need to keep the available memory size, etc. in mind, because that limits your ability to experiment with different K-mers. Oases uses lot more RAM than Velvet, and Velvet itself needs lot of memory.

Good luck,
Manoj

P. S.

1. SOAP denovo is for genome assembly. They cannot do transcriptomes, as far as I know.
2. ABySS is a parallel version of velvet. So, trans-ABySS is equivalent to OASES. However, I would recommend trying velvet first, because the parallel installation of ABySS requires some more effort.

---------------------

http://homolog.us

**seb567** · 02-02-2011, 01:03 PM

Originally posted by samanta View Post

2. ABySS is a parallel version of velvet. So, trans-ABySS is equivalent to OASES.

To whomever it may concern:

I am afraid you are obviously wrong here.

ABySS is not a parallel version of Velvet.

ABySS paper in Genome Research (2008)

ABySS: A parallel assembler for short read sequence data

http://genome.cshlp.org/content/19/6/1117

An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

Velvet paper in Genome Research (2009)

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

http://genome.cshlp.org/content/18/5/821.long

An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

Trans-ABySS paper in Nature Methods (2010)

http://www.nature.com/nmeth/journal/v7/n11/full/nmeth.1517.html

**samanta** · 02-02-2011, 03:06 PM

I should have said ABySS implements parallel version of de Brujin graph, whereas Velvet is single node de Brujin assembler, but we are splitting hairs here.

Let's hear from the authors of papers you quoted -

Velvet paper -

"We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words"

Abyss paper -

"The field of short read de novo assembly developed from pioneering work on de Bruijn graphs by Pevzner et al. (Pevzner and Tang 2001; Pevzner et al. 2001). The de Bruijn graph representation is prevalent in current short read assemblers, with Velvet (Zerbino and Birney 2008), ALLPATHS (Butler et al. 2008), and EULER-SR (Chaisson and Pevzner 2008) all following this approach."

"To assemble the very large data sets produced by sequencing individual human genomes, we have developed ABySS (Assembly By Short Sequencing). The primary innovation in ABySS is a distributed representation of a de Bruijn graph, which allows parallel computation of the assembly algorithm across a network of commodity computers." [emphasis mine]

**seb567** · 02-03-2011, 08:05 AM

Originally posted by samanta View Post

I should have said ABySS implements parallel version of de Brujin graph, whereas Velvet is single node de Brujin assembler, but we are splitting hairs here.

I agree with you that these two software implement a similar algorithmic approach for the assembly of genomes using de Bruijn graphs.

But saying that "ABySS is a parallel version of Velvet." is false and undervalues the work done over the years by the numerous researchers in that very field.

The use of paired-end reads in Velvet is described in a PLoS ONE paper (2009).

Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0008407

Background Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies. Principal Findings We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly. Conclusions These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler.

For ABySS, I think the contigs are merged according to a threshold on the number of bridging pairs.

Originally posted by samanta View Post

Let's hear from the authors of papers you quoted -

Velvet paper -

"We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words"

Precisely !

The said manipulation of these graphs is what makes Velvet so popular !

Furthermore, you can get acquainted with Dr. Zerbino's PhD thesis to fully apprehend the concepts he created for manipulating de Bruijn graphs.

Genome assembly and comparison using de Bruijn graphs

EMBL-EBI Training

http://www.ebi.ac.uk/training/ftp/PhDtheses/Daniel_Zerbino.pdf

We train scientists at all levels to get the most out of publicly available biological data.

The novelty, I think, is the use of long read markers and short read markers.
(Sections 2.3.4 & 2.3.5 of his thesis)

Originally posted by samanta View Post

Abyss paper -

"The field of short read de novo assembly developed from pioneering work on de Bruijn graphs by Pevzner et al. (Pevzner and Tang 2001; Pevzner et al. 2001). The de Bruijn graph representation is prevalent in current short read assemblers, with Velvet (Zerbino and Birney 2008), ALLPATHS (Butler et al. 2008), and EULER-SR (Chaisson and Pevzner 2008) all following this approach."

Same thing here. Professor Pavel Pevzner introduced the use of de Bruijn graph in 2001. In the EULER papers, eulerian paths are utilized to manipulate the de Bruijn graph in order to obtain an assembly.

So this cited paragraph highlights the importance of the de Bruijn graph representation, not how this graph is processed to yield an assembly.

Originally posted by samanta View Post

"To assemble the very large data sets produced by sequencing individual human genomes, we have developed ABySS (Assembly By Short Sequencing). The primary innovation in ABySS is a distributed representation of a de Bruijn graph, which allows parallel computation of the assembly algorithm across a network of commodity computers." [emphasis mine]

I think the true innovation of this paper is not only the distributed de Bruijn graph, but also a working assembler that generates contigs for a human genome.

Cheers !

-seb

**samanta** · 02-03-2011, 09:35 AM

Thank you......fully agree with what you said. I tend to get sloppy in my message board comments.

**moritzhess** · 02-07-2011, 06:29 AM

SOAP denovo has also been used for transcriptome assembly:

"De Novo Analysis of Transcriptome Dynamics in the Migratory Locust during the Development of Phase Traits"

I would also recommend a paper about transAbyss. It explains the functionality of the trans-... addon:
"De novo assembly and analysis of RNA-Seq Data"

As far as I experienced Abyss is far!!! less demanding regarding memory.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

de novo transcriptome assembly

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News