Seqanswers Leaderboard Ad

**Markiyan** · 06-08-2016, 05:52 AM

Try using est/DNA assemblers to do de nowo cDNA assembly.

I assume that you would like to make a good de novo assembly of the cDNA dataset.

So do the following:
1. Start with the longest reads possible (if Illumina use 2x250 reads) or use pacbio isoseq.
2. Flash (if Illumina PE).
3. Subsample (start with 100K reads), assemble, curate and for the next round:
Eliminate ("vector" screen) reads matching curated previous round(s) and repeat point 3 with 10X more data.

Once you you have processed all data or reached a saturation - create a reference from all steps combined and map reads to it...

This approach can cut the computation resources (RAM)/time required by orders of magnitude... and usually works quite well with most DNA assembly programs (increase minmatch/kmer length to 31 or more if it is at 12-22 bp range).

**GenoMax** · 06-08-2016, 06:00 AM

@Abhijit: Take a look at CD-HIT.

**gen2prot** · 06-08-2016, 08:41 AM

@Markiyan: Will any EST assembler work? I was thinking of IDBA or SOAP-de novo. Any you would suggest?

@Genomax: How does CD-Hit-EST compare to any of the other RNA-Seq assemblers?

**GenoMax** · 06-08-2016, 08:43 AM

Originally posted by gen2prot View Post

@Genomax: How does CD-Hit-EST compare to any of the other RNA-Seq assemblers?

That suggestion was not for assembly. I thought you just wanted to cluster EST's you already have. You could cluster reads but that would not be efficient.

If you are looking to deduplicate your data then bbduk.sh/dedupe.sh from BBMap suite may be options. See this thread.

**Markiyan** · 06-13-2016, 02:42 AM

Will any EST assembler work? - Yes, any assembler should work.

@Markiyan: Will any EST assembler work? I was thinking of IDBA or SOAP-de novo. Any you would suggest?

Using "divide and conquer" approach described above, you can work with any DNA/RNA sequence assembler (even if it was not originally intended for EST assembly by it's authors ex: I've used PHRAP on a few molluscs ESTs in 2009 done with 454 Titanium from cDNA, and got way better results after 4 iterations, than newbler 2.0 in the cDNA mode).

But it is better to use the tools specifically designed for EST assembly - so feel free to try any assemblers you like, starting from a smaller subset of the reads.

You can try MIRA or newbler in the DNA mode. Ideally, you want to be able to check your assembly results in the consed or similar assembly editor.

You also want your EST reads to be as long as reasonably possible to simplify the assembly (Pacbio Isoseq or Illumina in 2x250 mode would be my tools of choice).

Topics	Statistics	Last Post
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, Yesterday, 02:46 PM	0 responses 11 views 0 likes	Last Post by seqadmin Yesterday, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 13 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 23 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM

Seqanswers Leaderboard Ad

Announcement

Looking for software tools similar to Vmatch

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News