Seqanswers Leaderboard Ad

**kenietz** · 08-26-2012, 05:27 PM

Genome assembly.

**samanta** · 08-27-2012, 01:14 AM

Originally posted by kenietz View Post

Genome assembly.

Please check the Minia program discussed here. You can assemble a 3Gbase genome using about 6-8GB RAM.

404 Not Found

http://www.homolog.us/blogs/2012/07/19/quip-minia-slimgene-and-titus-browns-paper-on-scaling-metagenome/

You can also check the slides posted here -

404 Not Found

http://www.homolog.us/blogs/2012/08/10/thesis-slides-from-rayan-chikhi/

If you like to split the reads into parts, the paper by Titus Brown in the first link should help you.

Please email me (samanta at homolog.us), if you need more explanation of the algorithms, because I do not check the forum frequently. The state of the art is far ahead of Velvet with 512Gb RAM, etc.

**ymc** · 08-27-2012, 01:54 AM

If I classify the reads into different chromosomes using bwa, can I "de novo"ly assemble the chromosomes in a 64GB machine?

**samanta** · 08-27-2012, 06:58 AM

Originally posted by ymc View Post

If I classify the reads into different chromosomes using bwa, can I "de novo"ly assemble the chromosomes in a 64GB machine?

Interesting question.

i) For kind of de novo assembly we talk about, the chromosome sequences are not known. If they were known, why would you need de novo assembly in the first place?

ii) Where chromosomes exist and you are trying to do reassembly, yes it is possible to reduce the RAM requirement by partitioning the reads. However, remember that the RAM requirement for error-free reads is capped no matter how many reads you have. However, in world with errors, RAM requirement goes up linearly with the number of reads.

404 Not Found

http://www.homolog.us/blogs/2011/08/01/how-do-sequencing-errors-affect-de-bruijn-graphs/

iii) If you are trying to do reassembly of human genome using BWA, you are most likely interested in parts of chromosome with indels, etc. Unfortunately, BWA may not be able to capture the reads for those regions and assign to reference chromosome.

**samanta** · 08-27-2012, 07:00 AM

Originally posted by kenietz View Post

@SES:
Thank you for the information. The client wants to try out with 10x at first and then proceed with higher coverage. Yeah, i got it that SGA would probably be able to do the job. Now i am reading about readjoiner. I'm still considering if to take the job at all.

Btw, what kind of power would i really need to assemble 3Gb genome?

You can also request soapdenovo2 from BGI. Its RAM requirement is much better than SOAPdenovo, especially when you use k-mer skipping option.

**ymc** · 08-27-2012, 03:24 PM

Originally posted by samanta View Post

Interesting question.

i) For kind of de novo assembly we talk about, the chromosome sequences are not known. If they were known, why would you need de novo assembly in the first place?

I want to have better variant phasing than GATK's ReadBackedPhasing. Will that route do a better job?

**samanta** · 08-27-2012, 08:54 PM

Originally posted by ymc View Post

I want to have better variant phasing than GATK's ReadBackedPhasing. Will that route do a better job?

In my understanding, that is a different class of problem that none of the solutions suggested above (SGA, diginorm, SOAPdenovo, SGA etc.) is designed to handle. Typical de Bruijn graph-based genome assembly programs are designed to assemble genomes, where none exists. Haplotype difference is a second order issue that those programs are not expected to handle by design. In some situations (long indels), they may assemble two separate contigs for a chromosomal region, but that is fortuitous.

Of late, people are recognizing a need for algorithms to handle problems of type mentioned by you. Please take a look at the following two papers and check their programs freely distributed at their websites.

http://www.nature.com/ng/journal/v44/n2/full/ng.1028.html#/affil-auth

HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data - PubMed

http://www.ncbi.nlm.nih.gov/pubmed/22697235

Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinforma …

The paper mentioned in the following link is not directly relevant to your problem, but could be of help in de novo assembling highly polymorphic genome, where the assumption of no haplotype difference breaks down -

404 Not Found

http://www.homolog.us/blogs/2012/07/11/haplomerger-a-software-tool-for-assembling-highly-polymorphic-genomes/

**SES** · 08-28-2012, 06:29 AM

Originally posted by samanta View Post

Please check the Minia program discussed here. You can assemble a 3Gbase genome using about 6-8GB RAM.

404 Not Found

http://www.homolog.us/blogs/2012/07/19/quip-minia-slimgene-and-titus-browns-paper-on-scaling-metagenome/

You can also check the slides posted here -

404 Not Found

http://www.homolog.us/blogs/2012/08/10/thesis-slides-from-rayan-chikhi/

If you like to split the reads into parts, the paper by Titus Brown in the first link should help you.

Please email me (samanta at homolog.us), if you need more explanation of the algorithms, because I do not check the forum frequently. The state of the art is far ahead of Velvet with 512Gb RAM, etc.

This looks very interesting indeed. It's difficult to compare the results directly but I hope this project continues to develop. Thanks for posting.

**samanta** · 08-28-2012, 11:32 AM

Originally posted by ymc View Post

If I classify the reads into different chromosomes using bwa, can I "de novo"ly assemble the chromosomes in a 64GB machine?

ymc, I wrote this up on HapCompass algorithm that you may find interesting -

404 Not Found

http://www.homolog.us/blogs/2012/08/28/hapcompass-an-elegant-use-of-graphs-for-haplotype-assemblyphasing/

**narain** · 05-13-2013, 06:53 AM

How about Fermi assembler by Heng Li ? Is it not faster and more accurate than SGA or Readjoiner ?

**narain** · 05-13-2013, 09:31 AM

Readjoiner Features

It was interesting to read article on Readjoiner and notice it has several features as an improvement over SGA. Is Readjoiner MPI compatible. I read it is multithreaded, how good is the scalability ?

However, I notice that the tool does not perform well for erroneous reads as you showed in your e.coli data. Is it possible you integrate data cleaner and filters in Readjoiner itself ?

Also, on Plantagora metrics it seems that Readjoiner performs worse than SGA! It popped up with more number of insertions and deletions and misassembled contig bases than SGA or Edena!

**kenietz** · 05-13-2013, 05:12 PM

There is also the IDBA assembler:

IDBA-Bioinfomatics Research Group of Hong Kong University

http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/

Seems to work pretty well and does not use a lot of memory. I used it for denovo RNA-seq. Good sized transcripts.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News