Seqanswers Leaderboard Ad

**ffinkernagel** · 06-29-2010, 11:35 PM

Often: contaminations.
Try to assemble the non mappable reads, for example with ABYSS, blast the resulting contigs to get an idea of what's in there, then try to align again against those organisms you identified in the contamination.

You have checked the duplication ratio of your reads first though, right?

**simonandrews** · 06-29-2010, 11:37 PM

Poor quality sequence, contamination, enrichment of repetitive sequence... plenty of possible reasons.

I'd suggest running some QC on your raw sequence to see if that turns up any problems before delving any further into the failures. 40% isn't disastrously low so it may not be too serious a problem.

**hannat** · 06-29-2010, 11:55 PM

Originally posted by ffinkernagel View Post

Often: contaminations.
Try to assemble the non mappable reads, for example with ABYSS, blast the resulting contigs to get an idea of what's in there, then try to align again against those organisms you identified in the contamination.

You have checked the duplication ratio of your reads first though, right?

What is "duplication ratio", how should i estimate that? Thanks

**simonandrews** · 06-30-2010, 12:06 AM

Originally posted by hannat View Post

What is "duplication ratio", how should i estimate that? Thanks

It's a measure of how often each unique sequence is seen. High duplication levels indicate that your sequence may have been overamplified during library preparation. The QC report I linked to will show you a duplication level plot to see how many times you see unique, duplicated, triplicated etc sequences. It will also spot heavily overrepresented sequences in case you have a small number of heavy contaminants (eg primers).

**hannat** · 06-30-2010, 12:37 AM

Originally posted by simonandrews View Post

It's a measure of how often each unique sequence is seen. High duplication levels indicate that your sequence may have been overamplified during library preparation. The QC report I linked to will show you a duplication level plot to see how many times you see unique, duplicated, triplicated etc sequences. It will also spot heavily overrepresented sequences in case you have a small number of heavy contaminants (eg primers).

I see a rise in the end of the duplication plot, so i have large number of sequence which were duplicated.

Attached Files

Screenshot.jpg (12.2 KB, 55 views)

**simonandrews** · 06-30-2010, 11:20 PM

Originally posted by hannat View Post

I see a rise in the end of the duplication plot, so i have large number of sequence which were duplicated.

Actually I'd be more concerned about the front of the plot. This shows that you have a very high percentage of sequences which are replicated a small number of times (say up to 5). This either means that you have a huge fold coverage over the region that you're sequencing, or that your library has suffered from over-amplification.

What you would hope to see on these plots is that the duplication rate immediately falls to very close to zero and stays there. Any significant amount of duplication is something to be concerned about.

**kmcarr** · 07-01-2010, 05:16 AM

Going along with what Simon said (ooh, pad pun), how many reads are in your data set? The Neurospora crassa genome is ~40Mb. If you have close to, or more than 40 million reads you would expect to see some degree of low level duplication. The rise at the high end of the plot may be due to the over representation of the mt plasmid.

**amosine** · 01-11-2012, 11:53 PM

What's wrong with my ChIP seq data?

I perform H3K9me3 ChIP experiment and built the libarary acoording illumina's ChIP seq libarary protocol. The analysis of the data is as follows:

raw read: 46891730
map read: 42812364
uniq read: 40442380
used read:13409805
map ratio: 91.30%
uniq read: 86.25%
used ratio:28.6%
region: 253

The used raed/used ratio/region is too low. I cannot figure out the problem, could anyone help me?? Thanks!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

only 40% of reads were mapped successfully

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News