Seqanswers Leaderboard Ad

**dpryan** · 01-15-2015, 07:04 AM

Please don't cross-post on here and biostars.

**aupadhyaya** · 01-15-2015, 07:10 AM

Thanks for the note. I've removed the biostars post.

**seb.lees** · 01-16-2015, 12:19 AM

Hi aupadhyaya,

Assembly of repetitive regions is always an awful task, especially with only pair-ends, but here we need more information to properly figure out your problem.

First, what is(are) the size(s) of the repeat unit(s) ? Are the repeats divergent or identical ? Are they arranged in tandem ? Is it a kind of tansposon island ?

You also say that you are attempting a de novo assembly, but based on a reference which is of the same genotype... it's not very clear for me. You do expect some variation compared to the reference seauence ? What is the purpose of re-sequencing/re-assembling the reference genotype ?

By the way, in general it is better to reduce the number of allowed mismatches when mapping on repetitive regions, to target the correct repeat more specifically (if the repats are divergents of course).

And the proper assembly of repetitive regions other than microsatellites generally require mate pairs (or PacBio) reads.

seb.

**aupadhyaya** · 01-16-2015, 12:48 AM

There are a few types of repeats according to repeatmasker, some of which are identical and arranged nearly in tandem. I'm not too sure what a transposon island refers to, so I can't say much about that.

The reason I'm assembling the same genotype again is essentially as a sanity check for assembly of the region. I'm trying to assemble the region for another individual with very little success and wanted to see if the reference region could be done.

What you say about mismatches makes sense, but for some reason the best result, ie longest contigs, is with allowing some mismatches. I'm not too sure what to make of that.

If it helps, this is the region I'm looking at (available on jbrowse) Capsella rubella scaffold_2 7900000-7930000

**seb.lees** · 01-16-2015, 01:58 AM

Mmmh, Are you sure it is the Capsella rubella scaffold_2 7900000-7930000 region ? Because it appear that this region is not repeated at all, excepted a 200-bp microsatellite at pos 7910000, at least in the reference sequence available in GenBank (accession KB870806.1). There is indeed some loci which are repeated elsewhere in the genome of C. rubella, but with no more than 90% similarity, which shouldn't be a problem for the assembly.

Longer contigs doesn't mean best assembly! If you increased the number of allowed mismatches for the assembly, you would expect more assembly errors, especially at the repeated loci.

**aupadhyaya** · 01-16-2015, 02:22 AM

I'm sure this is the region. In terms of repetition, I'm a bit confused! there doesn't seem to be C.rubella specific annotation, but using A. thaliana repeats as a guide, around 13% of this region is annotated as repetitive (mostly as retroelements).

You're of course right about length not equaling quality! I have checked these contigs for accuracy on a first-pass basis through blast and they do look like good matches.

**seb.lees** · 01-19-2015, 02:03 AM

Hi aupadhyaya,

Indeed, these repetitive regions are probably retroelements. But if you blast the region on itself, there is no repetition.
So I don't understand why you are not able to reconstruct this region. The sequencing you've done is only this region (from a BAC) or the whole genome ?
My best guess is that these retro-elements are located elsewhere in the genome with very high similarity, creating several assembly routes that assemblers cannot solve with pair-ends only. You should definitively produce 4-5 Kbp mate-pair sequences.

**aupadhyaya** · 01-20-2015, 04:23 AM

The sequencing is genomic. I'm going to see if I can do some mate pair sequencing to get around this issue.

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

Problem with repetitive assembly

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News