Seqanswers Leaderboard Ad

**RLB_84** · 04-16-2011, 03:01 AM

hi, I read about your pipeline and it seems quite interesting, as I'm a newbie user of MUMmer. So I hope I could try your assembler as soon as possible. By the way, just one question about "masking": when I consider a contig dataset returned by Newbler, I should keep in mind that many of them can be repeated and I can't figure it from the contig fasta file (I need the other Newbler outputs). So, how does your assembler deal with the probable occurrence of a contig in more than one genome locus?

**glacerda** · 04-16-2011, 04:00 AM

Hi RLB_84, that's a very good question.

In addition to the contig fasta files, Zorro takes as input the reads file (a subsample of WGS reads). The reads are used only to allow us to identify repeats in the contigs. I will explain technically

1-Zorro counts the occurences of 22-mers in the reads file supplied by the user
2-22-mer words that are unique in the genome should occur proportionally to the genome coverage
3-22-mer words that represent repeats should occur at least twice the peak coverage
4-we select the 22-mer words that occur at least twice the mode of the distribution. These 22-mer words are used to mask the contig files using bowtie.

This technique is used by many ab initio repeat detection software. We do not need to screen repeat libraries and, even if newbler (or other software) has collapsed the repeats, we coulod still detect them.

**RLB_84** · 04-16-2011, 05:11 AM

Thanks for the clarification, that sounds good! So, considering I'm trying to merge two sets of contigs returned by Newbler and AbySS, this approach can make feasible to compare directly the coverage of the datasets, as Newbler comes out with a bunch of files (very useful about coverage and so on), while AbySS output needs more processing.

Does the "less stringent" assembly phase take into account the repeat prediction and try to infer this information in the assembly itself?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Zorro: The Masked Assembler (first public release)

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News